On 06/01/11 07:09, Devshree Sane wrote:
Hi,

I am trying to use DBpedia for one of my projects. All I want is to iterate
over the nodes in this
set<http://downloads.dbpedia.org/3.5.1/en/article_categories_en.nt.bz2>.
It has 10925705 triples. The Model.read(..) methods read all triples at once
in memory. However I have only 2GB RAM available, and hence I get "heap
space errors" or "GC limit exceeded errors".
Is there a BufferedIterator available for this purpose(which will not load
the entire graph in memory)?
If not, is there any other way this can be achieved? (Persistent storage via
TDB seems an overkill for this)

I am wondering why such a feature is not already in Jena? Or am I missing
something?


The file takes about 80-90s to parse. if you just want to do the run once, then RiotReader (currently in ARQ) provides a lower level way into parsing.

Reading into TDB will incur the parsing costs once.

        Andy

Reply via email to