Author: buildbot
Date: Sun Aug 3 09:43:28 2014
New Revision: 918270
Log:
Staging update by buildbot for jena
Modified:
websites/staging/jena/trunk/content/ (props changed)
websites/staging/jena/trunk/content/documentation/csv/design.html
Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Aug 3 09:43:28 2014
@@ -1 +1 @@
-1615396
+1615398
Modified: websites/staging/jena/trunk/content/documentation/csv/design.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/csv/design.html (original)
+++ websites/staging/jena/trunk/content/documentation/csv/design.html Sun Aug
3 09:43:28 2014
@@ -146,7 +146,7 @@
<div id="breadcrumbs"></div>
<h1 class="title">CSV PropertyTable - Design
</h1>
<h2 id="architecture">Architecture</h2>
-<p>The architecture of jena-csv mainly involves 2 components:</p>
+<p>The architecture of CSV PropertyTable mainly involves 2 components:</p>
<ul>
<li><a
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/PropertyTable.java">PropertyTable</a></li>
<li><a
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphPropertyTable.java">GraphPropertyTable</a></li>
@@ -166,16 +166,40 @@ With special storage, a PropertyTable</p
Each <a
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/Column.java">Column</a>
of the <code>PropertyTable</code> has an unique columnKey <code>Node</code> of
the predicate (or p for short).
Each <a
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/Row.java">Row</a>
of the <code>PropertyTable</code> has an unique rowKey <code>Node</code> of
the subject (or s for short).
You can use <code>getColumn()</code> to get the <code>Column</code> by its
columnKey <code>Node</code> of the predicate, while <code>getRow()</code> for
<code>Row</code>.</p>
+<p>A <code>PropertyTable</code> should be constructed in this workflow (in
order):</p>
+<ol>
+<li>Create <code>Columns</code> using
<code>PropertyTable.createColumn()</code> for each <code>Column</code> of the
<code>PropertyTable</code></li>
+<li>Create <code>Rows</code> using <code>PropertyTable.createRow()</code> for
each <code>Row</code> of the <code>PropertyTable</code></li>
+<li>For each <code>Row' created, set a value (</code>Node<code>) at the
specified</code>Column<code>, by calling</code>Row.setValue()`</li>
+</ol>
+<p>Once a <code>PropertyTable</code> is built, tabular data within can be
accessed by the API of <code>PropertyTable.getMatchingRows()</code>,
<code>PropertyTable.getColumnValues()</code>, etc.</p>
<h3 id="graphpropertytable">GraphPropertyTable</h3>
<p><code>GraphPropertyTable</code> implements the <a
href="https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/Graph.java">Graph</a>
interface (read-only) over a <code>PropertyTable</code>.
This is subclass from <a
href="https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/impl/GraphBase.java">GraphBase</a>
and implements <code>find()</code>.
The <code>graphBaseFind()</code> method can choose the access route based on
the find arguments.
It holds/wraps an reference of the <code>PropertyTable</code> instance, so
that such a graph can be treated in a more table-like fashion.</p>
-<p>Note that, both <code>PropertyTable</code> and
<code>GraphPropertyTable</code> are <em>NOT</em> restricted to CSV data.
+<p><strong>Note:</strong> Both <code>PropertyTable</code> and
<code>GraphPropertyTable</code> are <em>NOT</em> restricted to CSV data.
They are supposed to be compatible with any table-like data sources, such as
relational databases, Microsoft Excel, etc.</p>
<h3 id="graphcsv">GraphCSV</h3>
<p><a
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphCSV.java">GraphCSV</a>
is a sub class of GraphPropertyTable aiming at CSV data.
-Its constructor takes a CSV file path as the parameter and makes a
<code>GraphPropertyTable</code> through parsing the file.</p>
+Its constructor takes a CSV file path as the parameter, parse the file using a
CSV Parser, and makes a <code>PropertyTable</code> through
<code>PropertyTableBuilder</code>.</p>
+<p>For CSV to RDF mapping, we establish some basic principles:</p>
+<h4 id="single-value-and-regular-shaped-csv-only">Single-Value and
Regular-Shaped CSV only</h4>
+<p>In the <a href="https://www.w3.org/2013/csvw/wiki/Main_Page">CSV-WG</a>, it
looks like duplicate column names are not going to be supported. Therefore, we
just consider parsing single-valued CSV tables.
+There is the current editor working <a
href="http://w3c.github.io/csvw/syntax/">draft</a> from the CSV on the Web
Working Group, which is defining a more regular data out of CSV.
+This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV;
not arbitrary, irregularly shaped CSV.</p>
+<h4 id="no-additional-csv-metadata">No Additional CSV Metadata</h4>
+<p>A CSV file with no additional metadata is directly mapped to RDF, which
makes a simpler case compared to SQL-to-RDF work.
+It's not necessary to have a defined primary column, similar to the primary
key of database. The subject of the triple can be generated through one of:</p>
+<ol>
+<li>The triples for each row have a blank node for the subject, e.g. something
like the illustration</li>
+<li>The triples for row N have a subject URI which is
<code><FILE#_N></code>.</li>
+</ol>
+<h4 id="data-type-for-typed-literal">Data Type for Typed Literal</h4>
+<p>All the values in CSV are parsed as strings line by line. As a better
option for the user to turn on, a dynamic choice which is a posh way of saying
attempt to parse it as an integer (or decimal, double, date) and if it passes,
it's an integer (or decimal, double, date).</p>
+<h4 id="file-path-as-namespace">File Path as Namespace</h4>
+<p>RDF requires that the subjects and the predicates are URIs. We need to pass
in the namespaces (or just the default namespaces) to make URIs by combining
the namespaces with the values in CSV.
+We donât have metadata of the namespaces for the columns, But subjects can
be blank nodes which is useful because each row is then a new blank node. For
predicates, suppose the URL of the CSV file is
<code>file:///c:/town.csv</code>, then the columns can be
<code><file:///c:/town.csv#Town></code> and
<code><file:///c:/town.csv#Population></code>, as is showed in the
illustration.</p>
</div>
</div>