[jira] Issue Comment Edited: (CLEREZZA-366) Parsing of RDF Data loads everything into memory

JIRA Tue, 14 Dec 2010 08:37:27 -0800

    [ 
https://issues.apache.org/jira/browse/CLEREZZA-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965672#action_12965672
 ]


Reto Bachmann-Gmür edited comment on CLEREZZA-366 at 12/14/10 11:35 AM:
------------------------------------------------------------------------

I think this can be fixed without changing the API, just returning a Graph 
insatnce without having actually read the triples and read the triples on 
demand, i.e. when filter is invoked. For N-Triples this should be quite easy to 
implement.

UPDATE: Rethinking about it I think the API must be extended, either as 
suggested or by providing file backed graphs. 

      was (Author: reto):
    I think this can be fixed without changing the API, just returning a Graph 
insatnce without having actually read the triples and read the triples on 
demand, i.e. when filter is invoked. For N-Triples this should be quite easy to 
implement.
  
> Parsing of RDF Data loads everything into memory
> ------------------------------------------------
>
>                 Key: CLEREZZA-366
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-366
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Rupert Westenthaler
>
> The API of the org.apache.clerezza.rdf.core.serializedform.ParsingProvider 
> does not allow to parse the target MGraph for loading RDF data from the 
> InputStream. Therefore Implementations need to create there own MGraph 
> instances.
> The org.apache.clerezza.rdf.jena.parser.JenaParserProvider e.g. creates an 
> instance of SimpleMGraph to store the parsed Data.
> This design does not allow to "stream" parsed RDF data directly into the 
> final destination, but forces to load everything into an intermediate graph.
> This is a problem when importing big datasets especially because the 
> intermediate graph is kept in memory.
> Currently one would use
> TCProvider provider;  //e.g. a TdbTcProvider instance
> MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org";); //e.g. 
> loading a dump of dbPedia.org
> veryBigGraph(parser.parse(is, format, null)); //loads everything into memory 
> and than adding everything to the TDB store
> A possible solution would be to add a second ParsingProvider.parse(..) Method 
> that allows to parse an existing MGraph instance.
> This would allow to refactor the above code fragment like:
> TCProvider provider;  //e.g. a TdbTcProvider instance
> MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org";); //e.g. 
> loading a dump of dbPedia.org
> parser.parse(is, veryBigGraph, format, null); //loads everything directly 
> into the parsed MGraph 
> best
> Rupert Westenthaler 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CLEREZZA-366) Parsing of RDF Data loads everything into memory

Reply via email to