Parsing of RDF Data loads everything into memory
------------------------------------------------

                 Key: CLEREZZA-366
                 URL: https://issues.apache.org/jira/browse/CLEREZZA-366
             Project: Clerezza
          Issue Type: Improvement
            Reporter: Rupert Westenthaler


The API of the org.apache.clerezza.rdf.core.serializedform.ParsingProvider does 
not allow to parse the target MGraph for loading RDF data from the InputStream. 
Therefore Implementations need to create there own MGraph instances.
The org.apache.clerezza.rdf.jena.parser.JenaParserProvider e.g. creates an 
instance of SimpleMGraph to store the parsed Data.
This design does not allow to "stream" parsed RDF data directly into the final 
destination, but forces to load everything into an intermediate graph.
This is a problem when importing big datasets especially because the 
intermediate graph is kept in memory.

Currently one would use

TCProvider provider;  //e.g. a TdbTcProvider instance
MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org";); //e.g. 
loading a dump of dbPedia.org
veryBigGraph(parser.parse(is, format, null)); //loads everything into memory 
and than adding everything to the TDB store

A possible solution would be to add a second ParsingProvider.parse(..) Method 
that allows to parse an existing MGraph instance.
This would allow to refactor the above code fragment like:

TCProvider provider;  //e.g. a TdbTcProvider instance
MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org";); //e.g. 
loading a dump of dbPedia.org
parser.parse(is, veryBigGraph, format, null); //loads everything directly into 
the parsed MGraph 

best
Rupert Westenthaler 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to