implementation.html

buildbot Sun, 03 Aug 2014 03:31:22 -0700

Author: buildbot
Date: Sun Aug  3 10:30:34 2014
New Revision: 918273

Log:
Staging update by buildbot for jena


Modified:
    websites/staging/jena/trunk/content/   (props changed)
    websites/staging/jena/trunk/content/documentation/csv/design.html
    websites/staging/jena/trunk/content/documentation/csv/implementation.html

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Aug  3 10:30:34 2014
@@ -1 +1 @@
-1615398
+1615400

Modified: websites/staging/jena/trunk/content/documentation/csv/design.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/csv/design.html (original)
+++ websites/staging/jena/trunk/content/documentation/csv/design.html Sun Aug  
3 10:30:34 2014
@@ -152,7 +152,7 @@
 <li><a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphPropertyTable.java";>GraphPropertyTable</a></li>
 </ul>
 <p><img alt="Picture of architecture of jena-csv" 
src="jena-csv-architecture.png" title="Architecture of jena-csv" /></p>
-<h3 id="propertytable">PropertyTable</h3>
+<h2 id="propertytable">PropertyTable</h2>
 <p>A <code>PropertyTable</code> is collection of data that is sufficiently 
regular in shape it can be treated as a table.
 That means each subject has a value for each one of the set of properties.
 Irregularity in terms of missing values needs to be handled but not multiple 
values for the same property.
@@ -170,34 +170,34 @@ You can use <code>getColumn()</code> to 
 <ol>
 <li>Create <code>Columns</code> using 
<code>PropertyTable.createColumn()</code> for each <code>Column</code> of the 
<code>PropertyTable</code></li>
 <li>Create <code>Rows</code> using <code>PropertyTable.createRow()</code> for 
each <code>Row</code> of the <code>PropertyTable</code></li>
-<li>For each <code>Row' created, set a value (</code>Node<code>) at the 
specified</code>Column<code>, by calling</code>Row.setValue()`</li>
+<li>For each <code>Row</code> created, set a value (<code>Node</code>) at the 
specified <code>Column</code>, by calling <code>Row.setValue()</code></li>
 </ol>
 <p>Once a <code>PropertyTable</code> is built, tabular data within can be 
accessed by the API of <code>PropertyTable.getMatchingRows()</code>, 
<code>PropertyTable.getColumnValues()</code>, etc.</p>
-<h3 id="graphpropertytable">GraphPropertyTable</h3>
+<h2 id="graphpropertytable">GraphPropertyTable</h2>
 <p><code>GraphPropertyTable</code> implements the <a 
href="https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/Graph.java";>Graph</a>
 interface (read-only) over a <code>PropertyTable</code>. 
 This is subclass from <a 
href="https://svn.apache.org/repos/asf/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/graph/impl/GraphBase.java";>GraphBase</a>
 and implements <code>find()</code>. 
-The <code>graphBaseFind()</code> method can choose the access route based on 
the find arguments.
-It holds/wraps an reference of the <code>PropertyTable</code> instance, so 
that such a graph can be treated in a more table-like fashion.</p>
+The <code>graphBaseFind()</code>(for matching a <code>Triple</code>) and 
<code>propertyTableBaseFind()</code>(for matching a whole <code>Row</code>) 
methods can choose the access route based on the find arguments.
+<code>GraphPropertyTable</code> holds/wraps an reference of the 
<code>PropertyTable</code> instance, so that such a <code>Graph</code> can be 
treated in a more table-like fashion.</p>
 <p><strong>Note:</strong> Both <code>PropertyTable</code> and 
<code>GraphPropertyTable</code> are <em>NOT</em> restricted to CSV data.
 They are supposed to be compatible with any table-like data sources, such as 
relational databases, Microsoft Excel, etc.</p>
-<h3 id="graphcsv">GraphCSV</h3>
+<h2 id="graphcsv">GraphCSV</h2>
 <p><a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/GraphCSV.java";>GraphCSV</a>
 is a sub class of GraphPropertyTable aiming at CSV data.
 Its constructor takes a CSV file path as the parameter, parse the file using a 
CSV Parser, and makes a <code>PropertyTable</code> through 
<code>PropertyTableBuilder</code>.</p>
 <p>For CSV to RDF mapping, we establish some basic principles:</p>
-<h4 id="single-value-and-regular-shaped-csv-only">Single-Value and 
Regular-Shaped CSV only</h4>
+<h3 id="single-value-and-regular-shaped-csv-only">Single-Value and 
Regular-Shaped CSV only</h3>
 <p>In the <a href="https://www.w3.org/2013/csvw/wiki/Main_Page";>CSV-WG</a>, it 
looks like duplicate column names are not going to be supported. Therefore, we 
just consider parsing single-valued CSV tables. 
 There is the current editor working <a 
href="http://w3c.github.io/csvw/syntax/";>draft</a> from the CSV on the Web 
Working Group, which is defining a more regular data out of CSV.
 This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV; 
not arbitrary, irregularly shaped CSV.</p>
-<h4 id="no-additional-csv-metadata">No Additional CSV Metadata</h4>
+<h3 id="no-additional-csv-metadata">No Additional CSV Metadata</h3>
 <p>A CSV file with no additional metadata is directly mapped to RDF, which 
makes a simpler case compared to SQL-to-RDF work. 
 It's not necessary to have a defined primary column, similar to the primary 
key of database. The subject of the triple can be generated through one of:</p>
 <ol>
 <li>The triples for each row have a blank node for the subject, e.g. something 
like the illustration</li>
 <li>The triples for row N have a subject URI which is 
<code>&lt;FILE#_N&gt;</code>.</li>
 </ol>
-<h4 id="data-type-for-typed-literal">Data Type for Typed Literal</h4>
+<h3 id="data-type-for-typed-literal">Data Type for Typed Literal</h3>
 <p>All the values in CSV are parsed as strings line by line. As a better 
option for the user to turn on, a dynamic choice which is a posh way of saying 
attempt to parse it as an integer (or decimal, double, date) and if it passes, 
it's an integer (or decimal, double, date).</p>
-<h4 id="file-path-as-namespace">File Path as Namespace</h4>
+<h3 id="file-path-as-namespace">File Path as Namespace</h3>
 <p>RDF requires that the subjects and the predicates are URIs. We need to pass 
in the namespaces (or just the default namespaces) to make URIs by combining 
the namespaces with the values in CSV.
 We donât have metadata of the namespaces for the columns, But subjects can 
be blank nodes which is useful because each row is then a new blank node. For 
predicates, suppose the URL of the CSV file is 
<code>file:///c:/town.csv</code>, then the columns can be 
<code>&lt;file:///c:/town.csv#Town&gt;</code> and 
<code>&lt;file:///c:/town.csv#Population&gt;</code>, as is showed in the 
illustration.</p>
   </div>

Modified: 
websites/staging/jena/trunk/content/documentation/csv/implementation.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/csv/implementation.html 
(original)
+++ websites/staging/jena/trunk/content/documentation/csv/implementation.html 
Sun Aug  3 10:30:34 2014
@@ -19,7 +19,7 @@
     limitations under the License.
 -->
 
-  <title>Apache Jena - CSV PropertyTable - Implementation</title>
+  <title>Apache Jena - CSV PropertyTable - Implementation
</title>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
 
@@ -144,8 +144,65 @@
        <div class="row">
        <div class="col-md-12">
        <div id="breadcrumbs"></div>
-       <h1 class="title">CSV PropertyTable - Implementation</h1>
-  
+       <h1 class="title">CSV PropertyTable - Implementation
</h1>
+  <h2 id="propertytable-implementations">PropertyTable Implementations</h2>
+<p>There're 2 implementations for <code>PropertyTable</code>. The pros and 
cons are summarised in the following table: </p>
+<table>
+<thead>
+<tr>
+<th>PropertyTable Implementation</th>
+<th>Description</th>
+<th>Supported Indexes</th>
+<th>Advantages</th>
+<th>Disadvantages</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>PropertyTableArrayImpl</code></td>
+<td>implemented by a two-dimensioned Java array of <code>Nodes</code></td>
+<td>SPO, PSO</td>
+<td>compact memory usage, fast for querying with S and P, fast for query a 
whole <code>Row</code></td>
+<td>slow for query with O, table Row/Column size provided</td>
+</tr>
+<tr>
+<td><code>PropertyTableHashMapImpl</code></td>
+<td>implemented by several Java <code>HashMaps</code></td>
+<td>PSO, POS</td>
+<td>fast for querying with O, table Row/Column size not required</td>
+<td>more memory usage for HashMaps</td>
+</tr>
+</tbody>
+</table>
+<p>By default, <a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableArrayImpl.java";>PropertyTableArrayImpl</a>
 is used as the <code>PropertyTable</code> implementation held by 
<code>GraphCSV</code>.
+If you want to switch to <a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableHashMapImpl.java";>PropertyTableHashMapImpl</a>,
 just use the static method of <code>GraphCSV.createHashMapImpl()</code> to 
replace the default <code>new GraphCSV()</code> way.
+Here is an example:</p>
+<div class="codehilite"><pre><span class="n">Model</span> <span 
class="n">model_csv_array_impl</span> <span class="p">=</span> <span 
class="n">ModelFactory</span><span class="p">.</span><span 
class="n">createModelForGraph</span><span class="p">(</span><span 
class="n">new</span> <span class="n">GraphCSV</span><span 
class="p">(</span><span class="n">file</span><span class="p">));</span> <span 
class="o">//</span> <span class="n">PropertyTableArrayImpl</span>
+<span class="n">Model</span> <span class="n">model_csv_hashmap_impl</span> 
<span class="p">=</span> <span class="n">ModelFactory</span><span 
class="p">.</span><span class="n">createModelForGraph</span><span 
class="p">(</span><span class="n">GraphCSV</span><span class="p">.</span><span 
class="n">createHashMapImpl</span><span class="p">(</span><span 
class="n">file</span><span class="p">));</span> <span class="o">//</span> <span 
class="n">PropertyTableHashMapImpl</span>
+</pre></div>
+
+
+<h2 id="stagegenerator-optimization-for-graphpropertytable">StageGenerator 
Optimization for GraphPropertyTable</h2>
+<p>Accessing from SPARQL via <code>Graph.find()</code> will work, but it's not 
ideal. Some optimizations can be done for processing a SPARQL basic graph 
pattern. More explicitly, in the method of <code>OpExecutor.execute(OpBGP, 
...)</code>, when the target for the query is a 
<code>GraphPropertyTable</code>, it can get a whole <code>Row</code>, or 
<code>Rows</code>, of the table data and match the pattern with the 
bindings.</p>
+<p>The optimization of querying a whole <code>Row</code> in the PropertyTable 
are supported now.
+The following query pattern can be transformed into a <code>Row</code> 
querying, without generating triples:</p>
+<div class="codehilite"><pre>?<span class="n">x</span> <span 
class="p">:</span><span class="n">prop1</span> ?<span class="n">v</span> <span 
class="p">.</span>
+?<span class="n">x</span> <span class="p">:</span><span class="n">prop2</span> 
?<span class="n">w</span> <span class="p">.</span>
+<span class="p">...</span>
+</pre></div>
+
+
+<p>It's made by using the extension point of <code>StageGenerator</code>, 
because it's now just concerned with <code>BasicPattern</code>.
+The detailed workflow goes in this way:</p>
+<ol>
+<li>Split the incoming <code>BasicPattern</code> by subjects, (i.e. it becomes 
multiple sub BasicPatterns grouped by the same subjects. (see <a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/QueryIterPropertyTable.java";>QueryIterPropertyTable</a>
 )</li>
+<li>For each sub <code>BasicPattern</code>, if the <code>Triple</code> size 
within is greater than 1 (i.e. at least 2 <code>Triples</code>), it's turned 
into a <code>Row</code> querying, and processed by <a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/QueryIterPropertyTableRow.java";>QueryIterPropertyTableRow</a>,
 else if it contains only 1 <code>Triple</code>, it goes for the traditional 
<code>Triple</code> querying by <code>graph.graphBaseFind()</code></li>
+</ol>
+<p>In order to turn on this optimization, we need to register the <a 
href="https://svn.apache.org/repos/asf/jena/Experimental/jena-csv/src/main/java/org/apache/jena/propertytable/impl/StageGeneratorPropertyTable.java";>StageGeneratorPropertyTable</a>
 into ARQ context, before performing SPARQL querying:</p>
+<div class="codehilite"><pre><span class="n">StageGenerator</span> <span 
class="n">orig</span> <span class="p">=</span> <span class="p">(</span><span 
class="n">StageGenerator</span><span class="p">)</span><span 
class="n">ARQ</span><span class="p">.</span><span 
class="n">getContext</span><span class="p">().</span><span 
class="n">get</span><span class="p">(</span><span class="n">ARQ</span><span 
class="p">.</span><span class="n">stageGenerator</span><span class="p">)</span> 
<span class="p">;</span>
+<span class="n">StageGenerator</span> <span class="n">stageGenerator</span> 
<span class="p">=</span> <span class="n">new</span> <span 
class="n">StageGeneratorPropertyTable</span><span class="p">(</span><span 
class="n">orig</span><span class="p">)</span> <span class="p">;</span>
+<span class="n">StageBuilder</span><span class="p">.</span><span 
class="n">setGenerator</span><span class="p">(</span><span 
class="n">ARQ</span><span class="p">.</span><span 
class="n">getContext</span><span class="p">(),</span> <span 
class="n">stageGenerator</span><span class="p">)</span> <span class="p">;</span>
+</pre></div>
   </div>
 </div>

svn commit: r918273 - in /websites/staging/jena/trunk/content: ./ documentation/csv/design.html documentation/csv/implementation.html

Reply via email to