Re: [GSoC 2014] Data Tables for SPARQL

Ying Jiang Fri, 07 Mar 2014 04:11:27 -0800

Hi Andy,

Thanks for your explanations! Please check my further questions below:


On Tue, Mar 4, 2014 at 6:26 AM, Andy Seaborne <a...@apache.org> wrote:
> On 03/03/14 03:12, Ying Jiang wrote:
>>
>> Hi Andy,
>
>
> Hi Ying,
>
>
>>
>> Thanks for your suggestions! I'm more interested in JENA-625 (Data
>> Tables for SPARQL). I've seen your new comments in JIRA and studied
>> the source code of Tarql. I'd like to paste your comments here with my
>> questions below to clarify the details of this project:
>>
>> 1. CSV to RDF terms (tuples of RDF Terms is already supported
>> internally in Jena)
>>   - Questions:
>> 1.1 Tarql uses the first row of CSV as variable names. Should we use
>> the same idea?
>
>
> Seems like good start although care is needed because the column can be
> anything and SPARQL variables are restricted.
>
> If there is no header row, and we can require that app should say so by some
> mechanism, or if the app wants different names, then a way to provide that,
> falling back to something predicable if dull: ?_col1, ?_col2, ...
>
> See below - there's no need to have fixed variable names.
>
>
>> 1.2 As to "internal support of tuples of RDF terms in Jena", do you
>> mean com.hp.hpl.jena.sparql.algebra.table.TableData? Tarql uses
>> TableData to accommodate RDF term bindings from CSV.
>
>
> That and there is also some RDF tuples code to read/write a textual form as
> well:
>
> https://svn.apache.org/repos/asf/jena/Experimental/rdfpatch/src/main/java/org/apache/jena/riot/tio/
>
> (there are other versions of this code around - this is the ready to use
> form)
>
>
>> 2. Storage of the table (in-memory is enough, with reading from a file).
>>   - Questions:
>> 2.1 What's the life cycle of the in-memory table? Should we discard
>> the table after the query execution, or keep it in-memory for later
>> reuse with the same query or update, or use by a subsequent query?
>> When will the table be discarded?
>
>
> That'll need refining but a way to read and reuse.  There needs to be away
> for the app to pass in tables (a Map<Sting, ???> and a tool forerading CSVs
> to get the ???) because ...

When will the tables be passed in? TARQL loads the CSVs when parsing
the SPARQL query string. Shall we load the tables and create the Map
before querying and cache them for resue? This could be similar to
querying a Dataset, and the simplest way goes something like:

DataTableMap<String, DataTable> dtm =
DataTableSetFactory.createDataTableMap(); // The keys of dts are the
URI of the DataTables loaded.
dtm.addDataTable( "<ex:table_1>", "file:table_1.csv", true); // The
table data are loaded when added into the map.
dtm.addDataTable( "<ex:table_2>", "file:table_2.csv", false); // Or
the table data are *lazy* loaded during querying later on, i.e. not
loaded now.
Query query = QueryFactory.create(queryString) ; // New .jj will be
created for parsing TABLE and FROM TABLE clauses. However the
QueryFactory interface remains the same as before.
QueryExecution qExec = QueryExecutionFactory.create(query, model,
dtm) ; // New create method for QueryExecutionFactory to accomendate
dtm
... //dtm can be reused later on for other QueryExecutions, or be
discarded when the app ends.

Is the above what you mean? Any comments?
>
>
>> 3. Modify the SPARQL grammar to support FROM TABLE and TABLE (for
>> inclusion inside a larger query, c.f. SPARQL VALUES clause).
>>   - Questions:
>> 3.1 What're the differences between FROM TABLE and TABLE?
>
>
> FROM TABLE would be one way to get tables into the query as would passing it
> in in the query context.
>
> Queries can't be assumed to
>
> TABLE in a query is accessing the table, using it to get the
>
> TARQL, and I've only read the documentation, is a query over a single CSV
> file.  This project should be about multiple CSVs and combining with other
> RDF data.
>
> A quick sketch and the syntax is not checked as sensible:
>
> SELECT ... {
>   # Fixed column names
>   TABLE <uri> {
>      BIND (URI(CONCAT('http://example.com/ns#', ?b)) AS ?uri)
>      BIND (STRLANG(?a, 'en') AS ?with_language_tag)
>      FILTER (?v > 57)
>   }
> }
>
> More ambitious to have column naming and FILTERs:
>
> SELECT ...
> WHERE {
>
>    TABLE <uri> { "col1" AS ?myVar1 ,
>                  "col10" AS ?V ,
>                  "col5" AS ?appName
>                  FILTER(?V > 57) }
> }
>
> creates a set of bindings based on access description.
>

Are the <uri> after TABLE the key of the Map<Sting, ???>? If so, I now
understand the TABLE clauses from the examples. However, still not
sure about FROM TABLE. Could you please show me some query string
examples containing the FROM TABLE clauses?

>
>
>> 3.2 Tarql programmatically modify the query (parsed from standard
>> SPARQLParser11) with CSV tabsle data without touching the orginal
>> SPARQL grammar parsing module. Should we adopt a different approach of
>> modifying the parsing grammar of .jj files and just ask javacc to
>> generate the new parsing code?
>
>
> I think the latter if possible.
>
> This, like all projects, will need to move to a detailed design but I don't
> hink it puts the project as a whole at risk.  The basis TARQL idea would be
> a great addition
>
>         Andy
>
>
>>
>> 4. Modify execution to include tables.
>> Questions: No questions for this now.
>>
>> Best regards,
>> Ying Jiang
>>
>> On Thu, Feb 27, 2014 at 10:49 PM, Andy Seaborne <a...@apache.org> wrote:
>>>
>>> On 26/02/14 15:14, Ying Jiang wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> With the great guidance from the mentors, especially Andy, I had a
>>>> good time in GSoC 2013 working on jena-spatial [1]. I'm very grateful.
>>>> Really learnt a lot from that project.
>>>>
>>>> This year, I find the issue of "Extend CONSTRUCT to build quads" [1]
>>>> very interesting. I've used javacc before. I can understand the ARQ
>>>> module of parsing SPARQL strings. With a label of "gsoc2014", is it a
>>>> suitable project for Jena in GSoC 2014? Any more details about the
>>>> project? Thanks!
>>>>
>>>> Best regards,
>>>> Ying Jiang
>>>>
>>>> [1] http://jena.apache.org/documentation/query/spatial-query.html
>>>> [2] https://issues.apache.org/jira/browse/JENA-491
>>>>
>>>
>>> Hi there,
>>>
>>> Given your level of skill and expertise, this project is possibly a bit
>>> small for you.  It's not the same scale as jena-spatial. It's probably
>>> more
>>> suited to an undergraduate or someone looking to learn about working
>>> inside
>>> a moderately large existing codebase. You have a lot more software
>>> engineering experience.
>>>
>>> Can I interest you in one of:
>>>
>>> * JENA-625 especially the part about CSV ingestion.  There is now a W3C
>>> working group looking at tabular data on the web so we know this is
>>> interesting to the user community.
>>>
>>> * JENA-647, (only just added) which is server side query templates for
>>> creating data views.
>>>
>>> In conjunction with someone (else) doing JENA-632 (custom JSON from
>>> SPARQL
>>> query), we would have a data delivery platform for creating domain
>>> specific
>>> data delivery for webapp.
>>>
>>> (this was provided in the proprietary Talis platform as "SPARQL Stored
>>> Procedures" but that no longer exists.  No need to exactly follow that
>>> but
>>> it was a popular feature so it is useful).
>>>
>>> * JENA-624 which is about a new memory-based storage layer.  As a
>>> project,
>>> its nearer in scale to jena-spatial.  This is less about RDF and linked
>>> data
>>> and more about systems programming.
>>>
>>>          Andy
>>>
>

Re: [GSoC 2014] Data Tables for SPARQL

Reply via email to