[jira] [Updated] (JENA-1502) SPARQL extensions for processing CSV, XML and JSON

Claus Stadler (JIRA) Fri, 09 Mar 2018 01:32:45 -0800

     [ 
https://issues.apache.org/jira/browse/JENA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Claus Stadler updated JENA-1502:
--------------------------------
    Description: 
Many systems have been built so far for transforming heterogeneous data - most 
prominently CSV, XML and JSON) to RDF.
As it turns out, with a few extensions to ARQ, Jena becomes (at least for me) 
an extremely convenient tool for this task.

To clarify our point, for a project we have to convert several (open) datasets, 
and we came up with a solution where we just have to execute a sequence of 
SPARQL queries making use of our ARQ extensions.

In [this 
repository|https://github.com/QROWD/QROWD-RDF-Data-Integration/tree/master/datasets/1046-1051]
 there are sub folders with JSON datasets, and the conversion is just a matter 
of running the SPARQL queries in the files 
[workloads.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/workloads.sparql]
 (which adds triples describing workloads into a jena in-memory dataset) and 
[process.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/process.sparql]
 (which processes all workloads in that dataset and inserts triples into a 
(named) result graph). We created a [thin command line 
wrapper|https://github.com/SmartDataAnalytics/Sparqlintegrate] to conveniently 
run these conversions.

An example of these extension functions:
{code:sql}
# Add labels of train / bus stops
INSERT {
  GRAPH eg:result { ?s rdfs:label ?l }
}
WHERE {
  ?x eg:workload ?o
  BIND(json:path(?o, "$.stopNames") AS ?stopNames)
  ?stopNames json:unnest (?l ?i) .

  GRAPH ?x { ?s eg:stopId ?i }
}
{code}


In fact, these SPARQL ARQ extensions would enable any Jena-based project to 
perform such integration tasks - and for instance one could already start a 
Fuseki  in order to play around with conversions in a Web interface.



* Is there interest to integrate our ARQ [SPARQL extension 
functions|https://github.com/SmartDataAnalytics/jena-sparql-api/tree/develop/jena-sparql-api-sparql-ext]
 into Jena? If so, what would we have to do and where (which existing or new 
jena module) would be the most appropriate place?
We are also open to discussion and changes on what exactly the signatures of 
these extension functions should look like.
* Maybe the functionality of running files containing sequences of SPARQL 
queries could also be added to Jena directly - as I think there is no magic 
outside the scope of Jena to it.


  was:
Many systems have been built so far for transforming heterogeneous data - most 
prominently CSV, XML and JSON) to RDF.
As it turns out, with a few extensions to ARQ, Jena becomes (at least for me) 
an extremely convenient tool for this task.

To clarify our point, for a project we have to convert several (open) datasets, 
and we came up with a solution where we just have to execute a sequence of 
SPARQL queries making use of our ARQ extensions.

In [this 
repository|https://github.com/QROWD/QROWD-RDF-Data-Integration/tree/master/datasets/1046-1051]
 there are sub folders with JSON datasets, and the conversion is just a matter 
of running the SPARQL queries in the files 
[workloads.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/workloads.sparql]
 (which adds triples describing workloads into a jena in-memory dataset) and 
[process.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/process.sparql]
 (which processes all workloads in that dataset and inserts triples into a 
(named) result graph). We created a [thin command line 
wrapper|https://github.com/SmartDataAnalytics/Sparqlintegrate] to conveniently 
run these conversions.

An example of these extension functions:
{code:sql}
# Add labels of train / bus stops
INSERT {
  GRAPH eg:result { ?s rdfs:label ?l }
}
WHERE {
  ?x eg:workload ?o
  BIND(json:path(?o, "$.stopNames") AS ?stopNames)
  ?stopNames json:unnest (?l ?i) .

  GRAPH ?x { ?s eg:stopId ?i }
}
{code}


In fact, these SPARQL ARQ extensions would enable any Jena-based project to 
perform such integration tasks - and for instance one could already start a 
Fuseki  in order to play around with conversions in a Web interface.



* Is there interest to integrate our ARQ [SPARQL extension 
functions|https://github.com/SmartDataAnalytics/jena-sparql-api/tree/develop/jena-sparql-api-sparql-ext]
 into Jena? If so, what would we have to do and where (which existing or new 
jena module) would be the most appropriate place?
We are also open to discussion and changes on what exactly the signatures of 
these extension functions.
* Maybe the functionality of running files containing sequences of SPARQL 
queries could also be added to Jena directly - as I think there is no magic 
outside the scope of Jena to it.



> SPARQL extensions for processing CSV, XML and JSON
> --------------------------------------------------
>
>                 Key: JENA-1502
>                 URL: https://issues.apache.org/jira/browse/JENA-1502
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.6.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> Many systems have been built so far for transforming heterogeneous data - 
> most prominently CSV, XML and JSON) to RDF.
> As it turns out, with a few extensions to ARQ, Jena becomes (at least for me) 
> an extremely convenient tool for this task.
> To clarify our point, for a project we have to convert several (open) 
> datasets, and we came up with a solution where we just have to execute a 
> sequence of SPARQL queries making use of our ARQ extensions.
> In [this 
> repository|https://github.com/QROWD/QROWD-RDF-Data-Integration/tree/master/datasets/1046-1051]
>  there are sub folders with JSON datasets, and the conversion is just a 
> matter of running the SPARQL queries in the files 
> [workloads.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/workloads.sparql]
>  (which adds triples describing workloads into a jena in-memory dataset) and 
> [process.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/process.sparql]
>  (which processes all workloads in that dataset and inserts triples into a 
> (named) result graph). We created a [thin command line 
> wrapper|https://github.com/SmartDataAnalytics/Sparqlintegrate] to 
> conveniently run these conversions.
> An example of these extension functions:
> {code:sql}
> # Add labels of train / bus stops
> INSERT {
>   GRAPH eg:result { ?s rdfs:label ?l }
> }
> WHERE {
>   ?x eg:workload ?o
>   BIND(json:path(?o, "$.stopNames") AS ?stopNames)
>   ?stopNames json:unnest (?l ?i) .
>   GRAPH ?x { ?s eg:stopId ?i }
> }
> {code}
> In fact, these SPARQL ARQ extensions would enable any Jena-based project to 
> perform such integration tasks - and for instance one could already start a 
> Fuseki  in order to play around with conversions in a Web interface.
> * Is there interest to integrate our ARQ [SPARQL extension 
> functions|https://github.com/SmartDataAnalytics/jena-sparql-api/tree/develop/jena-sparql-api-sparql-ext]
>  into Jena? If so, what would we have to do and where (which existing or new 
> jena module) would be the most appropriate place?
> We are also open to discussion and changes on what exactly the signatures of 
> these extension functions should look like.
> * Maybe the functionality of running files containing sequences of SPARQL 
> queries could also be added to Jena directly - as I think there is no magic 
> outside the scope of Jena to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (JENA-1502) SPARQL extensions for processing CSV, XML and JSON

Reply via email to