Claus Stadler created JENA-1502:
-----------------------------------
Summary: SPARQL extensions for processing CSV, XML and JSON
Key: JENA-1502
URL: https://issues.apache.org/jira/browse/JENA-1502
Project: Apache Jena
Issue Type: Improvement
Components: ARQ
Affects Versions: Jena 3.6.0
Reporter: Claus Stadler
Many systems have been built so far for transforming heterogeneous data - most
prominently CSV, XML and JSON) to RDF.
As it turns out, with a few extensions to ARQ, Jena becomes (at least for me)
an extremely convenient tool for this task.
To clarify our point, for a project we have to convert several (open) datasets,
and we came up with a solution where we just have to execute a sequence of
SPARQL queries making use of our ARQ extensions.
In [this
repository|https://github.com/QROWD/QROWD-RDF-Data-Integration/tree/master/datasets/1046-1051]
there are sub folders with JSON datasets, and the conversion is just a matter
of running the SPARQL queries in the files
[workloads.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/workloads.sparql]
(which adds triples describing workloads into a jena in-memory dataset) and
[process.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/process.sparql]
(which processes all workloads in that dataset and inserts triples into a
(named) result graph). We created a [thin command line
wrapper|https://github.com/SmartDataAnalytics/Sparqlintegrate] to conveniently
run these conversions.
In fact, we could also start a Fuseki with these ARQ SPARQL extensions which
enables easily playing around with conversions in a Web interface.
* Is there interest to integrate our ARQ [SPARQL extension
functions|https://github.com/SmartDataAnalytics/jena-sparql-api/tree/develop/jena-sparql-api-sparql-ext]
into Jena? If so, what would we have to do and where (which existing or new
jena module) would be the most appropriate place?
We are also open to discussion and changes on what exactly the signatures of
these extension functions.
* Maybe the functionality of running files containing sequences of SPARQL
queries could also be added to Jena directly - as I think there is no magic
outside the scope of Jena to it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)