1/ Have you considered reading the DBpedia data into TDB? This would keep the triples on-disk (and have cached in-memory versions of a subset).

2/ A file can be read sequentially by using the parser directly (See RiotReader and pass in a Sink<Triple> that processes the stream of triples).

        Andy

On 14/03/11 18:42, Anuj Kumar wrote:
Hi All,

I am new to Jena and trying to explore it to work with large number of
N-Triples. The requirement is to read large number of N-Triples. For
example, a nt file from DBpedia dump that may run into GBs. I have to read
these triples, pick specific ones and further link it to the resource of
another set of triples. The goal is to link some of the entities based on
Linked Data concept. Once the mapping is done, I have to query the model
from that point onwards. I don't want to work by loading both the source and
target dataset in-memory.

To achieve this, I have first created a file model maker and then a named
model for the specific dataset being mapped. Now, I need to read the Triples
and add the mapping to this new model. What should be the right approach?

One way is to load the model using FileManager and iterate through the
statements and map them accordingly to the named model (i.e. our mapped
model) and at the end close it. This will work, but it will load all of the
triples in memory. Is this the right way to proceed or is there a way to
read the model sequentially at the time of mapping?

Just trying to understand the efficient way to map large set of N-Triples.
Need your suggestions.

Thanks,
Anuj

Reply via email to