Sure, I will let you know in case I have any queries. The tests were failing when I built SARQ on my machine but I will look into it later. As you mentioned, it is really good to understand the integration with LARQ as a reference. So, I am doing that.
Thanks for the info. - Anuj On Thu, Mar 17, 2011 at 1:14 PM, Paolo Castagna < [email protected]> wrote: > > > Anuj Kumar wrote: > >> Thanks Paolo. I am looking into LARQ and also SARQ. >> > > Be warned: SARQ is just an experiment (and currently unsupported). > However, if you prefer to use Solr, share with us you use case and your > reasons > and let me know if you have problems with it. > > SARQ might be a little bit behind in relation to the removals from the > index, > but you can look at what LARQ does and port the same approach into SARQ. > > Paolo > > > >> On Thu, Mar 17, 2011 at 12:18 AM, Paolo Castagna < >> [email protected]> wrote: >> >> >>> Anuj Kumar wrote: >>> >>> Hi Andy, >>>> >>>> I have loaded few n-triples into TDB in the offline mode using >>>> tdbloader. >>>> Loading as well as query is fast but if I try to use a regex, it becomes >>>> very slow. It is taking few minutes. On my 32bit machine it takes more >>>> than >>>> 10 mins (expected due to limited memory ~ 1.5GB) and on my 64bit machine >>>> (8GB) it takes around 5 mins. >>>> >>>> The query is pretty exhaustive, correct me if it is happening due to the >>>> filter- >>>> >>>> SELECT ?abstract >>>> WHERE { >>>> ?resource <http://www.w3.org/2000/01/rdf-schema#label> ?l . >>>> FILTER regex(?l, "Futurama", "i") . >>>> ?resource <http://dbpedia.org/ontology/abstract> ?abstract >>>> } >>>> >>>> I have loaded few abstracts from dbpedia dump and I am trying to get the >>>> abstracts from the label. This is very slow. If I remove the FILTER and >>>> give >>>> the exact label, it is fast (should be because of TDB indexing). >>>> >>>> What is the right way to do such regex search or text search over the >>>> graph? >>>> I have seen suggestions to use Lucene and I also saw the LARQ >>>> initiative. >>>> Is >>>> that the right way to go? >>>> >>>> Yes, using LARQ (which is included in ARQ) will greatly speed up your >>> query. >>> LARQ documentation is here: >>> http://jena.sourceforge.net/ARQ/lucene-arq.html >>> You will need to build the Lucene index first, though. >>> >>> Paolo >>> >>> >>> >>> Thanks, >>>> Anuj >>>> >>>> On Tue, Mar 15, 2011 at 5:09 PM, Andy Seaborne < >>>> [email protected]> wrote: >>>> >>>> Just so you know: The TDB bulkloader can load all the data offline - >>>> it's >>>> >>>>> faster than using Fuseki for data loading online. >>>>> >>>>> Andy >>>>> >>>>> >>>>> On 15/03/11 11:22, Anuj Kumar wrote: >>>>> >>>>> Hi Andy, >>>>> >>>>>> Thanks for the info. I have loaded few GBs using Fuseki Server but I >>>>>> didn't >>>>>> try RiotReader or Java APIs for TDB. Will try that. >>>>>> Thanks for the response. >>>>>> >>>>>> Regards, >>>>>> Anuj >>>>>> >>>>>> On Tue, Mar 15, 2011 at 4:12 PM, Andy Seaborne< >>>>>> [email protected]> wrote: >>>>>> >>>>>> 1/ Have you considered reading the DBpedia data into TDB? This would >>>>>> >>>>>> keep >>>>>>> the triples on-disk (and have cached in-memory versions of a subset). >>>>>>> >>>>>>> 2/ A file can be read sequentially by using the parser directly (See >>>>>>> RiotReader and pass in a Sink<Triple> that processes the stream of >>>>>>> triples). >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> >>>>>>> On 14/03/11 18:42, Anuj Kumar wrote: >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I am new to Jena and trying to explore it to work with large number >>>>>>>> of >>>>>>>> N-Triples. The requirement is to read large number of N-Triples. For >>>>>>>> example, a nt file from DBpedia dump that may run into GBs. I have >>>>>>>> to >>>>>>>> read >>>>>>>> these triples, pick specific ones and further link it to the >>>>>>>> resource >>>>>>>> of >>>>>>>> another set of triples. The goal is to link some of the entities >>>>>>>> based >>>>>>>> on >>>>>>>> Linked Data concept. Once the mapping is done, I have to query the >>>>>>>> model >>>>>>>> from that point onwards. I don't want to work by loading both the >>>>>>>> source >>>>>>>> and >>>>>>>> target dataset in-memory. >>>>>>>> >>>>>>>> To achieve this, I have first created a file model maker and then a >>>>>>>> named >>>>>>>> model for the specific dataset being mapped. Now, I need to read the >>>>>>>> Triples >>>>>>>> and add the mapping to this new model. What should be the right >>>>>>>> approach? >>>>>>>> >>>>>>>> One way is to load the model using FileManager and iterate through >>>>>>>> the >>>>>>>> statements and map them accordingly to the named model (i.e. our >>>>>>>> mapped >>>>>>>> model) and at the end close it. This will work, but it will load all >>>>>>>> of >>>>>>>> the >>>>>>>> triples in memory. Is this the right way to proceed or is there a >>>>>>>> way >>>>>>>> to >>>>>>>> read the model sequentially at the time of mapping? >>>>>>>> >>>>>>>> Just trying to understand the efficient way to map large set of >>>>>>>> N-Triples. >>>>>>>> Need your suggestions. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anuj >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>
