Re: Fuseki + Larq : Lucene indexing

Paolo Castagna Mon, 12 Sep 2011 06:31:21 -0700

Andy Seaborne wrote:

On 12/09/11 11:24, Paolo Castagna wrote:
Hi Jérôme,
you are lucky, I've just exactly the same need as you and I'vesomething about it recently.Unfortunately, the new LARQ (as a separate module) still did not makeit into Fuseki on trunk.
We have an open JIRA for it which you can watch|vote|contribute to:
https://issues.apache.org/jira/browse/JENA-63
Should we chnage the title of JENA-63? It's not about Fuseki, whichjust supplies the SPARQL protocol and routes requests to the rightdataset. It's the dataset that must do the LARQ coordination - initialindexing and incrementally later, across restarts.


Hi Andy,
I am not sure the title of the JENA-63 is going to make much difference.

Users (@ Talis as well) want to easily have SPARQL endpoints and they also
want to easily run free-text searches on those SPARQL endpoints.
Fuseki, currently, provide a very good user experience in terms of quickly
have a SPARQL endpoint, however it does not include free-text search
capabilities.

The patch in JENA-63 does not contain any code change to Fuseki source code,
it only adds LARQ jar (and transitively Lucene v3.1.0) to its dependencies.
All the other necessary code changes have been done already elsewhere (i.e.
ARQ and LARQ).

What would be a more appropriate title?

The overall goal is to make as easy as possible for users to perform free-text
searches over their RDF data if they want to. Notice: this feature is not
"standard" and it is not enabled by default.

Once LARQ is properly released, do you see problems in adding it (and Lucene
v3.1.0) to the Fuseki dependencies?

LARQ is ~46KB.
Lucene v3.1.0 is (unfortunately) much bigger: ~1.2MB.

Is the size of Lucene's jar a concern?

Paolo

It is possible to get Fuseki to automatically run initialization code -the configuration file support ja:loadClass (a bit misnamed - it loadsand runs a static) but I don't think that is anything other than astop-gap.
    Andy
In the meantime, if you want to use LARQ with Fuseki this is what youneed to do:
cd /tmp
svn cohttps://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/fuseki
cd /tmp/fuseki
wgethttps://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0<  JENA-63_Fuseki_r1136050.patch
mvn package

Now, you can simply use the Fuseki config.ttl file as explained here:
http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
and use the ja:textIndex property on a dataset to specify an nonexisting directory.
LARQ when you point it at a non existing directory will perform theindexing for you.This is particularly useful when you have multiple datasets configuredin Fuseki.
WARNING: it might take a while to index large datasets, so be patient.

See also: http://markmail.org/thread/tmptip55ru5wxrrj

LARQ snapshots are here:
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/and I can quickly fix/improve things if you have problems or goodsuggestions.
I hope this helps, let me know how it goes.

Paolo

Jérôme wrote:
Hi,

i'm trying to use LARQ with my Fuseki server.

I would like to programmaticaly indexing(with lucene) documents when the
server starts.

Something like that:

Model model = ModelFactory.createDefaultModel();
IndexBuilderString larqBuilder = new IndexBuilderString();
model.register(larqBuilder);
FileManager.get().readModel(model, "Data/books.ttl");
larqBuilder.closeWriter();
model.unregister(larqBuilder);
index = larqBuilder.getIndex();
LARQ.setDefaultIndex(index);

Is it possible? In which class it would be the best?

Thanks

Jerome

Re: Fuseki + Larq : Lucene indexing

Reply via email to