On 03/03/14 03:12, Ying Jiang wrote:
Hi Andy,
Hi Ying,
Thanks for your suggestions! I'm more interested in JENA-625 (Data
Tables for SPARQL). I've seen your new comments in JIRA and studied
the source code of Tarql. I'd like to paste your comments here with my
questions below to clarify the details of this project:
1. CSV to RDF terms (tuples of RDF Terms is already supported
internally in Jena)
- Questions:
1.1 Tarql uses the first row of CSV as variable names. Should we use
the same idea?
Seems like good start although care is needed because the column can be
anything and SPARQL variables are restricted.
If there is no header row, and we can require that app should say so by
some mechanism, or if the app wants different names, then a way to
provide that, falling back to something predicable if dull: ?_col1,
?_col2, ...
See below - there's no need to have fixed variable names.
1.2 As to "internal support of tuples of RDF terms in Jena", do you
mean com.hp.hpl.jena.sparql.algebra.table.TableData? Tarql uses
TableData to accommodate RDF term bindings from CSV.
That and there is also some RDF tuples code to read/write a textual form
as well:
https://svn.apache.org/repos/asf/jena/Experimental/rdfpatch/src/main/java/org/apache/jena/riot/tio/
(there are other versions of this code around - this is the ready to use
form)
2. Storage of the table (in-memory is enough, with reading from a file).
- Questions:
2.1 What's the life cycle of the in-memory table? Should we discard
the table after the query execution, or keep it in-memory for later
reuse with the same query or update, or use by a subsequent query?
When will the table be discarded?
That'll need refining but a way to read and reuse. There needs to be
away for the app to pass in tables (a Map<Sting, ???> and a tool
forerading CSVs to get the ???) because ...
3. Modify the SPARQL grammar to support FROM TABLE and TABLE (for
inclusion inside a larger query, c.f. SPARQL VALUES clause).
- Questions:
3.1 What're the differences between FROM TABLE and TABLE?
FROM TABLE would be one way to get tables into the query as would
passing it in in the query context.
Queries can't be assumed to
TABLE in a query is accessing the table, using it to get the
TARQL, and I've only read the documentation, is a query over a single
CSV file. This project should be about multiple CSVs and combining with
other RDF data.
A quick sketch and the syntax is not checked as sensible:
SELECT ... {
# Fixed column names
TABLE <uri> {
BIND (URI(CONCAT('http://example.com/ns#', ?b)) AS ?uri)
BIND (STRLANG(?a, 'en') AS ?with_language_tag)
FILTER (?v > 57)
}
}
More ambitious to have column naming and FILTERs:
SELECT ...
WHERE {
TABLE <uri> { "col1" AS ?myVar1 ,
"col10" AS ?V ,
"col5" AS ?appName
FILTER(?V > 57) }
}
creates a set of bindings based on access description.
3.2 Tarql programmatically modify the query (parsed from standard
SPARQLParser11) with CSV tabsle data without touching the orginal
SPARQL grammar parsing module. Should we adopt a different approach of
modifying the parsing grammar of .jj files and just ask javacc to
generate the new parsing code?
I think the latter if possible.
This, like all projects, will need to move to a detailed design but I
don't hink it puts the project as a whole at risk. The basis TARQL idea
would be a great addition
Andy
4. Modify execution to include tables.
Questions: No questions for this now.
Best regards,
Ying Jiang
On Thu, Feb 27, 2014 at 10:49 PM, Andy Seaborne <[email protected]> wrote:
On 26/02/14 15:14, Ying Jiang wrote:
Hi,
With the great guidance from the mentors, especially Andy, I had a
good time in GSoC 2013 working on jena-spatial [1]. I'm very grateful.
Really learnt a lot from that project.
This year, I find the issue of "Extend CONSTRUCT to build quads" [1]
very interesting. I've used javacc before. I can understand the ARQ
module of parsing SPARQL strings. With a label of "gsoc2014", is it a
suitable project for Jena in GSoC 2014? Any more details about the
project? Thanks!
Best regards,
Ying Jiang
[1] http://jena.apache.org/documentation/query/spatial-query.html
[2] https://issues.apache.org/jira/browse/JENA-491
Hi there,
Given your level of skill and expertise, this project is possibly a bit
small for you. It's not the same scale as jena-spatial. It's probably more
suited to an undergraduate or someone looking to learn about working inside
a moderately large existing codebase. You have a lot more software
engineering experience.
Can I interest you in one of:
* JENA-625 especially the part about CSV ingestion. There is now a W3C
working group looking at tabular data on the web so we know this is
interesting to the user community.
* JENA-647, (only just added) which is server side query templates for
creating data views.
In conjunction with someone (else) doing JENA-632 (custom JSON from SPARQL
query), we would have a data delivery platform for creating domain specific
data delivery for webapp.
(this was provided in the proprietary Talis platform as "SPARQL Stored
Procedures" but that no longer exists. No need to exactly follow that but
it was a popular feature so it is useful).
* JENA-624 which is about a new memory-based storage layer. As a project,
its nearer in scale to jena-spatial. This is less about RDF and linked data
and more about systems programming.
Andy