Hi Sarven,
good and thanks for let me know it's working fine for you.
Maybe, an improvement to avoid people not to remove the ja:textIndex
is to put a try {} catch {} around the bit in the AssemblerLARQ which
is constructing an IndexWriter and fail back to just a read-only
IndexLARQ is something goes wrong:
if ( dataset != null ) {
IndexWriter indexWriter = IndexWriterFactory.create(directory);
IndexBuilderModel larqBuilder = new IndexBuilderString(indexWriter) ;
dataset.getDefaultModel().register(larqBuilder);
for ( Iterator<String> iter = dataset.listNames() ; iter.hasNext() ; ) {
String g = iter.next() ;
dataset.getNamedModel(g).register(larqBuilder) ;
}
indexReader = IndexReader.open(indexWriter, true);
indexLARQ = new IndexLARQ(indexWriter) ;
} else {
...
http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/main/java/org/apache/jena/larq/assembler/AssemblerLARQ.java
(I think I'll be doing this).
Another option would be to somehow ignore|remove the --larq
option and demand an assembler file with a ja:textIndex property
in it.
If you have other ideas/suggestions, please, share them with us.
The other observation is that the patch for Fuseki is only changing
the Fuseki pom.xml file including LARQ as dependency. All the other
necessary changes have been committed to ARQ and TDB and available
in the current SNAPSHOTs.
Other projects could start using the new LARQ as separate module
in the exact same way as Fuseki does. This, in my opinion, would
generate useful feedback and help identifying eventual bug (in
particular in relation to deletes and/or avoiding duplicates).
Paolo
Sarven Capadisli wrote:
Super awesome Paolo! It worked as you've described.
I've just did a quick:
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
SELECT ?doc
WHERE {
?lit pf:textMatch '+text' .
?doc ?p ?lit .
}
and got results!
Thanks again for tracking this down. I'll keep you informed if something
else comes up wrt to this.
-Sarven
On Thu, 2011-06-16 at 14:03 +0100, Paolo Castagna wrote:
Hi Sarven,
I think I have identified the problem.
With Lucene we can have only one IndexWriter at the time.
When we run larq.larqbuilder and we specify --desc=tdb.ttl
we need to make sure tdb.ttl does not have a ja:textIndex
property in it.
This is because larqbuilder creates one Lucene IndexWriter
and then it calls the DataSourceAssembler which is trying
to create another Lucene IndexWriter if ja:textIndex is
there.
Also, from the fact that you still have "null" in your
error message... I am not 100% sure you are using the
latest ARQ SNAPSHOT. To be absolutely sure, could you
try deleting it from your .m2 Maven repository.
However, I am experiencing another problem with Fuseki
which I do not understand:
cd /tmp
svn co http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
cd /tmp/fuseki
mvn test
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.openjena.fuseki.TS_Fuseki
INFO [qtp1548452562-20] (SPARQL_ServletBase.java:118) - [1] POST
http://localhost:3535/dataset/update
INFO [qtp1548452562-20] (SPARQL_ServletBase.java:153) - [1] 204 No Content
INFO [qtp1548452562-21] (SPARQL_ServletBase.java:118) - [2] GET
http://localhost:3535/dataset/data?default=
INFO [qtp1548452562-21] (SPARQL_ServletBase.java:153) - [2] 200 OK
INFO [qtp1548452562-22] (SPARQL_ServletBase.java:118) - [3] GET
http://localhost:3535/dataset/data?graph=http://graph/1
INFO [qtp1548452562-22] (SPARQL_ServletBase.java:155) - [3] 404 No such graph:
<http://graph/1>
INFO [qtp1548452562-23] (SPARQL_ServletBase.java:118) - [4] POST
http://localhost:3535/dataset/update
INFO [qtp1548452562-23] (SPARQL_ServletBase.java:153) - [4] 204 No Content
INFO [qtp1548452562-24] (SPARQL_ServletBase.java:118) - [5] GET
http://localhost:3535/dataset/data?graph=http://graph/1
INFO [qtp1548452562-24] (SPARQL_ServletBase.java:155) - [5] 404 No such graph:
<http://graph/1>
INFO [qtp1548452562-19] (SPARQL_ServletBase.java:118) - [6] POST
http://localhost:3535/dataset/update
INFO [qtp1548452562-19] (SPARQL_ServletBase.java:153) - [6] 204 No Content
It stops here, forever.
To retry the patch for LARQ in Fuseki, do:
cd /tmp
svn co http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
cd /tmp/fuseki
wget
https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0 < JENA-63_Fuseki_r1136050.patch
mvn -DskipTests=true package
Then you should be able to index a dataset using:
java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar larq.larqbuilder
--allow-duplicates --larq=/tmp/lucene --desc=/path/to/your/assembler.ttl
But, please, make sure you do not have any ja:textIndex in your assembler.ttl
when you do the initial bulk indexing.
You should put the ja:textIndex back in for normal operations.
It's not ideal, but at least now I understand the cause of the problem
and there is a workaround.
Let me know how it goes.
Paolo
Sarven Capadisli wrote:
Thanks a lot Paulo. Just some feedback on these steps:
My pom.xml contains http://pastebin.com/5HAJBL95
$ java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar larq.larqbuilder
--allow-duplicates --larq=/usr/lib/fuseki/lucene-index/
--desc=/usr/lib/fuseki/tdb2.ttl
11:56:34 WARN DataSourceAssembler :: Unable to initialize LARQ using
org.apache.jena.larq.assembler.AssemblerLARQ: null
11:56:34 WARN DataSourceAssembler :: Unable to initialize LARQ using
com.hp.hpl.jena.query.larq.AssemblerLARQ: null
-Sarven
On Thu, 2011-06-16 at 10:45 +0100, Paolo Castagna wrote:
Hi Sarven,
first of all, thanks for your email.
This is about an open issue (an improvement) which aim is to add the new LARQ
(i.e. the one as separate module) to Fuseki and make it as easy as possible for
people to use.
See: https://issues.apache.org/jira/browse/JENA-63
The issue is still open and there are problems.
I cleaned up the attachments on JENA-63 and uploaded a new patch for Fuseki.
This is how you can apply the patch to Fuseki:
cd /tmp
svn co http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
cd /tmp/fuseki
wget
https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0 < JENA-63_Fuseki_r1136050.patch
mvn package
Then you should be able to index a dataset using:
java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar larq.larqbuilder
--allow-duplicates --larq=/tmp/lucene --desc=/path/to/your/assembler.ttl
However, there is a problem (I improved the error message):
10:32:35 WARN DataSourceAssembler :: Unable to initialize LARQ using
org.apache.jena.larq.assembler.AssemblerLARQ: Lock obtain timed out:
NativeFSLock@/tmp/lucene/write.lock
This is the stack trace:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.hp.hpl.jena.sparql.core.assembler.DataSourceAssembler.createTextIndex(DataSourceAssembler.java:115)
at
com.hp.hpl.jena.sparql.core.assembler.DataSourceAssembler.createTextIndex(DataSourceAssembler.java:97)
at
com.hp.hpl.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:22)
at
com.hp.hpl.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:118)
at
com.hp.hpl.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:105)
at
com.hp.hpl.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:69)
at
com.hp.hpl.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:37)
at
com.hp.hpl.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:34)
at
com.hp.hpl.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:88)
at arq.cmdline.ModAssembler.create(ModAssembler.java:55)
at
arq.cmdline.ModDatasetAssembler.createDataset(ModDatasetAssembler.java:31)
at arq.cmdline.ModDataset.getDataset(ModDataset.java:22)
at larq.larqbuilder.exec(larqbuilder.java:84)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:85)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:47)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:34)
at larq.larqbuilder.main(larqbuilder.java:50)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/tmp/lucene/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1097)
at
org.apache.jena.larq.IndexWriterFactory.create(IndexWriterFactory.java:36)
at
org.apache.jena.larq.assembler.AssemblerLARQ.make(AssemblerLARQ.java:85)
... 21 more
It seems to me that Lucene is failing to acquire the write.lock,
as if initialization code were called twice.
I have not yet identified the cause of this and I am investigating.
Apologies and be patience (until we make progress and we close JENA-63).
Paolo
Sarven Capadisli wrote:
Hi,
I'd like to get Fuseki and LARQ running. Below is where I'm at. Any help
would be great:
I use https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk
and it sits at /usr/lib/fuseki
I have
http://ftp.heanet.ie/mirrors/www.apache.org/dist//lucene/java/3.2.0/lucene-3.2.0.tgz
at /usr/lib/lucene/
I've applied
https://issues.apache.org/jira/secure/attachment/12478735/JENA-63_Fuseki_r8810.patch
?doc ?property ?object
My /usr/lib/fuseki/pom.xml is http://pastebin.com/Cpaz75ai
My /usr/lib/fuseki/tdb2.ttl is http://pastebin.com/SXv5LWEn
When I run
$java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar larq.larqbuilder
--allow-duplicates --larq=/usr/lib/lucene/index/
--desc=/usr/lib/fuseki/tdb2.ttl
I get http://pastebin.com/JQPqsPtH
-Sarven