[ https://issues.apache.org/jira/browse/CLEREZZA-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965672#action_12965672 ]
Reto Bachmann-Gmür edited comment on CLEREZZA-366 at 12/14/10 11:35 AM: ------------------------------------------------------------------------ I think this can be fixed without changing the API, just returning a Graph insatnce without having actually read the triples and read the triples on demand, i.e. when filter is invoked. For N-Triples this should be quite easy to implement. UPDATE: Rethinking about it I think the API must be extended, either as suggested or by providing file backed graphs. was (Author: reto): I think this can be fixed without changing the API, just returning a Graph insatnce without having actually read the triples and read the triples on demand, i.e. when filter is invoked. For N-Triples this should be quite easy to implement. > Parsing of RDF Data loads everything into memory > ------------------------------------------------ > > Key: CLEREZZA-366 > URL: https://issues.apache.org/jira/browse/CLEREZZA-366 > Project: Clerezza > Issue Type: Improvement > Reporter: Rupert Westenthaler > > The API of the org.apache.clerezza.rdf.core.serializedform.ParsingProvider > does not allow to parse the target MGraph for loading RDF data from the > InputStream. Therefore Implementations need to create there own MGraph > instances. > The org.apache.clerezza.rdf.jena.parser.JenaParserProvider e.g. creates an > instance of SimpleMGraph to store the parsed Data. > This design does not allow to "stream" parsed RDF data directly into the > final destination, but forces to load everything into an intermediate graph. > This is a problem when importing big datasets especially because the > intermediate graph is kept in memory. > Currently one would use > TCProvider provider; //e.g. a TdbTcProvider instance > MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org"); //e.g. > loading a dump of dbPedia.org > veryBigGraph(parser.parse(is, format, null)); //loads everything into memory > and than adding everything to the TDB store > A possible solution would be to add a second ParsingProvider.parse(..) Method > that allows to parse an existing MGraph instance. > This would allow to refactor the above code fragment like: > TCProvider provider; //e.g. a TdbTcProvider instance > MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org"); //e.g. > loading a dump of dbPedia.org > parser.parse(is, veryBigGraph, format, null); //loads everything directly > into the parsed MGraph > best > Rupert Westenthaler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.