Sure. Will take a look at that as well. Interesting!
Regarding SARQ, I just tried it once. The errors were related to clean up of
Solr indexes during the tests. Here are the details-
INFO [33039485@qtp-1012673-2] (SolrCore.java:1324) - [sarq] webapp=/solr
path=/update params={wt=ja
vabin&version=1} status=500 QTime=5
ERROR [33039485@qtp-1012673-2] (SolrException.java:139) -
java.io.IOException: Cannot delete .\solr\
sarq\data\index\lucene-d12b45df2c6d6ae2efebf4cb75b8da25-write.lock
at
org.apache.lucene.store.NativeFSLockFactory.clearLock(NativeFSLockFactory.java:143)
at org.apache.lucene.store.Directory.clearLock(Directory.java:141)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1541)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1402)
at
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.deleteAll(DirectUpdateHandler2.java:167)
at
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:323)
at
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFacto
ry.java:71)
at
org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBa
se.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:440)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:843)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
ERROR [Finalizer] (SolrIndexWriter.java:242) - SolrIndexWriter was not
closed prior to finalize(), i
ndicates a bug -- POSSIBLE RESOURCE LEAK!!!
INFO [33039485@qtp-1012673-2] (DirectUpdateHandler2.java:165) - [sarq]
REMOVING ALL DOCUMENTS FROM
INDEX
INFO [33039485@qtp-1012673-2] (LogUpdateProcessorFactory.java:171) - {} 0 5
ERROR [33039485@qtp-1012673-2] (SolrException.java:139) -
java.io.IOException: Cannot delete .\solr\
sarq\data\index\lucene-d12b45df2c6d6ae2efebf4cb75b8da25-write.lock
at
org.apache.lucene.store.NativeFSLockFactory.clearLock(NativeFSLockFactory.java:143)
at org.apache.lucene.store.Directory.clearLock(Directory.java:141)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1541)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1402)
at
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.deleteAll(DirectUpdateHandler2.java:167)
at
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:323)
at
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFacto
ry.java:71)
at
org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBa
se.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:440)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:843)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
I can take a look at it but first I need to understand the integration
point.
Thanks,
Anuj
On Thu, Mar 17, 2011 at 1:34 PM, Paolo Castagna <
[email protected]> wrote:
> About test failing strange... I don't see failures:
> Tests run: 41, Failures: 0, Errors: 0, Skipped: 0
> Share details on your failures, I might have a look (but not today).
>
> If you are keen, you can look at EARQ as well which is not just about
> ElasticSearch.
> It was done to experiment with a refactoring which made easier to plug-in
> different
> indexes... and indeed EARQ has Lucene, Solr and ElasticSearch in it):
> https://github.com/castagna/EARQ
>
> Paolo
>
> Anuj Kumar wrote:
>
>> Sure, I will let you know in case I have any queries. The tests were
>> failing when I built SARQ on my machine but I will look into it later. As
>> you mentioned, it is really good to understand the integration with LARQ as
>> a reference. So, I am doing that.
>>
>> Thanks for the info.
>>
>> - Anuj
>>
>> On Thu, Mar 17, 2011 at 1:14 PM, Paolo Castagna <
>> [email protected] <mailto:[email protected]>>
>> wrote:
>>
>>
>>
>> Anuj Kumar wrote:
>>
>> Thanks Paolo. I am looking into LARQ and also SARQ.
>>
>>
>> Be warned: SARQ is just an experiment (and currently unsupported).
>> However, if you prefer to use Solr, share with us you use case and
>> your reasons
>> and let me know if you have problems with it.
>>
>> SARQ might be a little bit behind in relation to the removals from
>> the index,
>> but you can look at what LARQ does and port the same approach into
>> SARQ.
>>
>> Paolo
>>
>>
>>
>> On Thu, Mar 17, 2011 at 12:18 AM, Paolo Castagna <
>> [email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>> Anuj Kumar wrote:
>>
>> Hi Andy,
>>
>> I have loaded few n-triples into TDB in the offline mode
>> using tdbloader.
>> Loading as well as query is fast but if I try to use a
>> regex, it becomes
>> very slow. It is taking few minutes. On my 32bit machine
>> it takes more
>> than
>> 10 mins (expected due to limited memory ~ 1.5GB) and on
>> my 64bit machine
>> (8GB) it takes around 5 mins.
>>
>> The query is pretty exhaustive, correct me if it is
>> happening due to the
>> filter-
>>
>> SELECT ?abstract
>> WHERE {
>> ?resource <http://www.w3.org/2000/01/rdf-schema#label> ?l
>> .
>> FILTER regex(?l, "Futurama", "i") .
>> ?resource <http://dbpedia.org/ontology/abstract>
>> ?abstract
>> }
>>
>> I have loaded few abstracts from dbpedia dump and I am
>> trying to get the
>> abstracts from the label. This is very slow. If I remove
>> the FILTER and
>> give
>> the exact label, it is fast (should be because of TDB
>> indexing).
>>
>> What is the right way to do such regex search or text
>> search over the
>> graph?
>> I have seen suggestions to use Lucene and I also saw the
>> LARQ initiative.
>> Is
>> that the right way to go?
>>
>> Yes, using LARQ (which is included in ARQ) will greatly
>> speed up your
>> query.
>> LARQ documentation is here:
>> http://jena.sourceforge.net/ARQ/lucene-arq.html
>> You will need to build the Lucene index first, though.
>>
>> Paolo
>>
>>
>>
>> Thanks,
>> Anuj
>>
>> On Tue, Mar 15, 2011 at 5:09 PM, Andy Seaborne <
>> [email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Just so you know: The TDB bulkloader can load all the
>> data offline - it's
>>
>> faster than using Fuseki for data loading online.
>>
>> Andy
>>
>>
>> On 15/03/11 11:22, Anuj Kumar wrote:
>>
>> Hi Andy,
>>
>> Thanks for the info. I have loaded few GBs using
>> Fuseki Server but I
>> didn't
>> try RiotReader or Java APIs for TDB. Will try that.
>> Thanks for the response.
>>
>> Regards,
>> Anuj
>>
>> On Tue, Mar 15, 2011 at 4:12 PM, Andy Seaborne<
>> [email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>> 1/ Have you considered reading the DBpedia data
>> into TDB? This would
>>
>> keep
>> the triples on-disk (and have cached
>> in-memory versions of a subset).
>>
>> 2/ A file can be read sequentially by using
>> the parser directly (See
>> RiotReader and pass in a Sink<Triple> that
>> processes the stream of
>> triples).
>>
>> Andy
>>
>>
>> On 14/03/11 18:42, Anuj Kumar wrote:
>>
>> Hi All,
>>
>> I am new to Jena and trying to explore
>> it to work with large number of
>> N-Triples. The requirement is to read
>> large number of N-Triples. For
>> example, a nt file from DBpedia dump
>> that may run into GBs. I have to
>> read
>> these triples, pick specific ones and
>> further link it to the resource
>> of
>> another set of triples. The goal is to
>> link some of the entities based
>> on
>> Linked Data concept. Once the mapping is
>> done, I have to query the
>> model
>> from that point onwards. I don't want to
>> work by loading both the
>> source
>> and
>> target dataset in-memory.
>>
>> To achieve this, I have first created a
>> file model maker and then a
>> named
>> model for the specific dataset being
>> mapped. Now, I need to read the
>> Triples
>> and add the mapping to this new model.
>> What should be the right
>> approach?
>>
>> One way is to load the model using
>> FileManager and iterate through the
>> statements and map them accordingly to
>> the named model (i.e. our
>> mapped
>> model) and at the end close it. This
>> will work, but it will load all
>> of
>> the
>> triples in memory. Is this the right way
>> to proceed or is there a way
>> to
>> read the model sequentially at the time
>> of mapping?
>>
>> Just trying to understand the efficient
>> way to map large set of
>> N-Triples.
>> Need your suggestions.
>>
>> Thanks,
>> Anuj
>>
>>
>>
>>
>>
>>
>>