Re: BufferedIterators for ontologies?

Andy Seaborne Thu, 06 Jan 2011 05:18:02 -0800


On 06/01/11 07:09, Devshree Sane wrote:

Hi,

I am trying to use DBpedia for one of my projects. All I want is to iterate
over the nodes in this
set<http://downloads.dbpedia.org/3.5.1/en/article_categories_en.nt.bz2>.
It has 10925705 triples. The Model.read(..) methods read all triples at once
in memory. However I have only 2GB RAM available, and hence I get "heap
space errors" or "GC limit exceeded errors".
Is there a BufferedIterator available for this purpose(which will not load
the entire graph in memory)?
If not, is there any other way this can be achieved? (Persistent storage via
TDB seems an overkill for this)

I am wondering why such a feature is not already in Jena? Or am I missing
something?

The file takes about 80-90s to parse. if you just want to do the runonce, then RiotReader (currently in ARQ) provides a lower level way intoparsing.


Reading into TDB will incur the parsing costs once.

        Andy

Re: BufferedIterators for ontologies?

Reply via email to