On Thursday, January 06, 2011 07:09:50 am Devshree Sane wrote: > I am trying to use DBpedia for one of my projects. All I want is to iterate > over the nodes in this > set<http://downloads.dbpedia.org/3.5.1/en/article_categories_en.nt.bz2>. > It has 10925705 triples. The Model.read(..) methods read all triples at once > in memory.
No, it reads into whatever the Model says. If the Model is an SDB model, the triples go into the database. If it's a TDB model, they get stored in TDB files. The *default* Models are memory-based. > However I have only 2GB RAM available, and hence I get "heap > space errors" or "GC limit exceeded errors". > Is there a BufferedIterator available for this purpose(which will not load > the entire graph in memory)? No, but ... > If not, is there any other way this can be achieved? ... yes. Subclass GraphSink and override PerformAdd to do whatever you want. Then make a Model from that Graph and model.read your RDF through it. > (Persistent storage via TDB seems an overkill for this) Why? Youy're all set up for doing it again then, and you can run ad-hoc local queries against the data if you want to. > I am wondering why such a feature is not already in Jena? No real call for it. A single pass through the data doesn't let you exploit RDF very much -- you're just seeing triples in pseudo-random order. (You could always sort the ntriples data to get clustering but you're still not able to use cross-links.) Chris -- "Feel the world turning upside-down" - The Reasoning, /Dark Angel/ Epimorphics Ltd, http://www.epimorphics.com Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT Epimorphics Ltd. is a limited company registered in England (number 7016688)
