Thanks Paolo. I am looking into LARQ and also SARQ.
On Thu, Mar 17, 2011 at 12:18 AM, Paolo Castagna <
[email protected]> wrote:
>
>
> Anuj Kumar wrote:
>
>> Hi Andy,
>>
>> I have loaded few n-triples into TDB in the offline mode using tdbloader.
>> Loading as well as query is fast but if I try to use a regex, it becomes
>> very slow. It is taking few minutes. On my 32bit machine it takes more
>> than
>> 10 mins (expected due to limited memory ~ 1.5GB) and on my 64bit machine
>> (8GB) it takes around 5 mins.
>>
>> The query is pretty exhaustive, correct me if it is happening due to the
>> filter-
>>
>> SELECT ?abstract
>> WHERE {
>> ?resource <http://www.w3.org/2000/01/rdf-schema#label> ?l .
>> FILTER regex(?l, "Futurama", "i") .
>> ?resource <http://dbpedia.org/ontology/abstract> ?abstract
>> }
>>
>> I have loaded few abstracts from dbpedia dump and I am trying to get the
>> abstracts from the label. This is very slow. If I remove the FILTER and
>> give
>> the exact label, it is fast (should be because of TDB indexing).
>>
>> What is the right way to do such regex search or text search over the
>> graph?
>> I have seen suggestions to use Lucene and I also saw the LARQ initiative.
>> Is
>> that the right way to go?
>>
>
> Yes, using LARQ (which is included in ARQ) will greatly speed up your
> query.
> LARQ documentation is here:
> http://jena.sourceforge.net/ARQ/lucene-arq.html
> You will need to build the Lucene index first, though.
>
> Paolo
>
>
>
>> Thanks,
>> Anuj
>>
>> On Tue, Mar 15, 2011 at 5:09 PM, Andy Seaborne <
>> [email protected]> wrote:
>>
>> Just so you know: The TDB bulkloader can load all the data offline - it's
>>> faster than using Fuseki for data loading online.
>>>
>>> Andy
>>>
>>>
>>> On 15/03/11 11:22, Anuj Kumar wrote:
>>>
>>> Hi Andy,
>>>>
>>>> Thanks for the info. I have loaded few GBs using Fuseki Server but I
>>>> didn't
>>>> try RiotReader or Java APIs for TDB. Will try that.
>>>> Thanks for the response.
>>>>
>>>> Regards,
>>>> Anuj
>>>>
>>>> On Tue, Mar 15, 2011 at 4:12 PM, Andy Seaborne<
>>>> [email protected]> wrote:
>>>>
>>>> 1/ Have you considered reading the DBpedia data into TDB? This would
>>>>
>>>>> keep
>>>>> the triples on-disk (and have cached in-memory versions of a subset).
>>>>>
>>>>> 2/ A file can be read sequentially by using the parser directly (See
>>>>> RiotReader and pass in a Sink<Triple> that processes the stream of
>>>>> triples).
>>>>>
>>>>> Andy
>>>>>
>>>>>
>>>>> On 14/03/11 18:42, Anuj Kumar wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>>> I am new to Jena and trying to explore it to work with large number of
>>>>>> N-Triples. The requirement is to read large number of N-Triples. For
>>>>>> example, a nt file from DBpedia dump that may run into GBs. I have to
>>>>>> read
>>>>>> these triples, pick specific ones and further link it to the resource
>>>>>> of
>>>>>> another set of triples. The goal is to link some of the entities based
>>>>>> on
>>>>>> Linked Data concept. Once the mapping is done, I have to query the
>>>>>> model
>>>>>> from that point onwards. I don't want to work by loading both the
>>>>>> source
>>>>>> and
>>>>>> target dataset in-memory.
>>>>>>
>>>>>> To achieve this, I have first created a file model maker and then a
>>>>>> named
>>>>>> model for the specific dataset being mapped. Now, I need to read the
>>>>>> Triples
>>>>>> and add the mapping to this new model. What should be the right
>>>>>> approach?
>>>>>>
>>>>>> One way is to load the model using FileManager and iterate through the
>>>>>> statements and map them accordingly to the named model (i.e. our
>>>>>> mapped
>>>>>> model) and at the end close it. This will work, but it will load all
>>>>>> of
>>>>>> the
>>>>>> triples in memory. Is this the right way to proceed or is there a way
>>>>>> to
>>>>>> read the model sequentially at the time of mapping?
>>>>>>
>>>>>> Just trying to understand the efficient way to map large set of
>>>>>> N-Triples.
>>>>>> Need your suggestions.
>>>>>>
>>>>>> Thanks,
>>>>>> Anuj
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>