Parsing of RDF Data loads everything into memory ------------------------------------------------
Key: CLEREZZA-366 URL: https://issues.apache.org/jira/browse/CLEREZZA-366 Project: Clerezza Issue Type: Improvement Reporter: Rupert Westenthaler The API of the org.apache.clerezza.rdf.core.serializedform.ParsingProvider does not allow to parse the target MGraph for loading RDF data from the InputStream. Therefore Implementations need to create there own MGraph instances. The org.apache.clerezza.rdf.jena.parser.JenaParserProvider e.g. creates an instance of SimpleMGraph to store the parsed Data. This design does not allow to "stream" parsed RDF data directly into the final destination, but forces to load everything into an intermediate graph. This is a problem when importing big datasets especially because the intermediate graph is kept in memory. Currently one would use TCProvider provider; //e.g. a TdbTcProvider instance MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org"); //e.g. loading a dump of dbPedia.org veryBigGraph(parser.parse(is, format, null)); //loads everything into memory and than adding everything to the TDB store A possible solution would be to add a second ParsingProvider.parse(..) Method that allows to parse an existing MGraph instance. This would allow to refactor the above code fragment like: TCProvider provider; //e.g. a TdbTcProvider instance MGraph veryBigGraph = provider.createMGraph("http://dbPedia.org"); //e.g. loading a dump of dbPedia.org parser.parse(is, veryBigGraph, format, null); //loads everything directly into the parsed MGraph best Rupert Westenthaler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.