Re: Fuseki + Larq : Lucene indexing

Jérôme Tue, 13 Sep 2011 08:04:42 -0700

Le 13/09/11 16:59, Paolo Castagna a écrit :

Jérôme wrote:
Le 12/09/11 17:52, Paolo Castagna a écrit :
Jérôme wrote:
Le 12/09/11 16:13, Paolo Castagna a écrit :
Jérôme wrote:
Le 12/09/11 15:18, Paolo Castagna a écrit :
Jérôme wrote:
Le 12/09/11 12:24, Paolo Castagna a écrit :
Hi Jérôme,
you are lucky, I've just exactly the same need as you and I'vesomething about it recently.Unfortunately, the new LARQ (as a separate module) still didnot make it into Fuseki on trunk.
We have an open JIRA for it which you canwatch|vote|contribute to:
https://issues.apache.org/jira/browse/JENA-63
In the meantime, if you want to use LARQ with Fuseki this iswhat you need to do:
cd /tmp
svn cohttps://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/fuseki
cd /tmp/fuseki
wgethttps://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0<  JENA-63_Fuseki_r1136050.patch
mvn package
Now, you can simply use the Fuseki config.ttl file asexplained here:
http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
and use the ja:textIndex property on a dataset to specify annon existing directory.
Is it possible to have a fuseki configuration example with aja:textIndex property? I am trying to
add it on the book service (books.ttl) with no results...
Use tdbloader to load some RDF data into /tmp/tdb, then change<#dataset>
on the example config.ttl file you have in Fuseki:
http://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/config.ttl
I've never used the TDB loader - How does it work? Is there anon-line documentation?
Fortunately, TDB is included in Fuseki uber jar (since it includesFusekibinaries as well as all the jar dependencies, including TDB). So,in this
case, for an end-users it's quite useful.

Here is what I do:

cd /tmp/fuseki
java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader--loc=/tmp/tdb books.ttl
Thank you! It's ok for that!
Good.

So, are you able to query your RDF data using the pf:textMatch property
function? For example:

PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
DESCRIBE ?doc {
 ?title pf:textMatch 'potter' .
 ?doc dc:title ?title .
} LIMIT 10
I thought, but it doesn't work...
There is no error, but my resultSet is empty.
It's ok for Sparql queries, not for LARQ ones.

How to be sure that my document is well indexed?

I've ran:
java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader--loc=/tmp/tdb books.ttl
My end config.ttl file contains:
<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "/tmp/tdb" ;
    ja:textIndex "/tmp/lucene" ;
    # Query timeout on this dataset (milliseconds)
    ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
##      tdb:unionDefaultGraph true ;
Can you try to delete the /tmp/lucene directory and restart Fuseki?

Ok...fine play, you're right! The Larq query works now!
Thank you very much.

Jérôme

Let me know,
Paolo
I run Fuseki with this command line:
./fuseki-server --config=config.ttl
Now i would like to add modifications in the larq module.
LARQ is open source and you are free and welcome to do so if youwant/need:
cd /tmp
svn cohttps://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/ larq
cd /tmp/larq
... make your changes ...
mvn install
Using mvn install Maven will install LARQ artifacts in your localMaven repository in your home directory.
However, it would be good if you could share what are yourmodifications,why you need them and your use case. Your changes might be useful toothers.
If your changes do not get contributed back, you will need tomaintain themand they will represent a cost for you. Every time we release a newversionof LARQ with features you might want, you will need to re-apply yourchanges.
Yes i know.
So, I encourage you to share them and maybe open a new JIRA issue(with apatch attached to it). Not all changes are general and useful enoughto get
committed, but let's see.
Ok - it could be a good solution for everybody.
I've downloaded and built it. How can i re compile my Fuseki mavenproject using my own larq jar?
Once you have published your modified version of LARQ in your localMaven
repository, it is available to other projects on your machine.

This is how you recompile Fuseki using your modified LARQ jar:

cd /tmp
svn cohttps://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/fuseki
cd /tmp/fuseki
wgethttps://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0<  JENA-63_Fuseki_r1136050.patch
mvn package
Make sure LARQ versions in your LARQ pom.xml and Fuseki pom.xmlcorrespond.
Once again, if you get your changes adopted and committed to trunkyou would
not need to do all this.

Is it explain() the feature you desperately need?
Not really - finally, explain is not what i'm expecting.
Could you share more on why you need it and what is your use case?
The aim is to querying (from an html form) an RDF graph where not aredocuments (text documents).We would like to obtain matching documents (availables from luceneHits) + a list of matching String + a list of offset.Offsets/Positions will be sent to another module that needs thepositions to perform its tasks.
Example of query (case of a LARQ query, queries could be SPARQL only):
Query: w*

doc1: when I was a child I was a Jedi

Expected result:
Doc id: 1
Matching strings: when, was, was
Offsets: 0-3; 7-9;21-23


Do you think it would be interesting to share it?

Jérôme
explain() can be expensive and it is Lucene specific, if could cause
problems if in the future we want to support/move/change and use Solr
and/or ElasticSearch: https://issues.apache.org/jira/browse/JENA-17.

Paolo
Thanks.
This will load the data in books.ttl and build the TDB indexes in/tmp/tdb
You can also use the -h option for help:

java -cp target/fuseki-0.2.1-SNAPSHOT-sys.jar tdb.tdbloader -h
tdbloader [--desc DATASET | -loc DIR] FILE ...
  Location
      --loc=DIR              Location (a directory)
      --tdb=                 Assembler description file
  Symbol definition
      --set                  Set a configuration symbol to a value
--strict Operate in strict SPARQL mode (noextensions of any kind)
      --graph=IRI            Act on a named graph
      --desc=                Assembler description file
  General
      -v   --verbose         Verbose
      -q   --quiet           Run with minimal output
      --debug                Output information for debugging
      --help
      --version              Version information


Paolo
Thanks
[...]

<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "/tmp/tdb" ;
    ja:textIndex "/tmp/lucene" ;
    .
If the /tmp/lucene directory does not exist, LARQ will indexwhat you have in
/tmp/tdb creating the appropriate Lucene indexes.


Paolo
Thanks
LARQ when you point it at a non existing directory willperform the indexing for you.This is particularly useful when you have multiple datasetsconfigured in Fuseki.WARNING: it might take a while to index large datasets, so bepatient.
See also: http://markmail.org/thread/tmptip55ru5wxrrj

LARQ snapshots are here:
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/and I can quickly fix/improve things if you have problems orgood suggestions.
I hope this helps, let me know how it goes.

Paolo

Jérôme wrote:
Hi,

i'm trying to use LARQ with my Fuseki server.
I would like to programmaticaly indexing(with lucene)documents when the
server starts.

Something like that:

Model model = ModelFactory.createDefaultModel();
IndexBuilderString larqBuilder = new IndexBuilderString();
model.register(larqBuilder);
FileManager.get().readModel(model, "Data/books.ttl");
larqBuilder.closeWriter();
model.unregister(larqBuilder);
index = larqBuilder.getIndex();
LARQ.setDefaultIndex(index);

Is it possible? In which class it would be the best?

Thanks

Jerome

Re: Fuseki + Larq : Lucene indexing

Reply via email to