Andy Seaborne wrote:


On 12/09/11 11:24, Paolo Castagna wrote:
Hi Jérôme,
you are lucky, I've just exactly the same need as you and I've something about it recently. Unfortunately, the new LARQ (as a separate module) still did not make it into Fuseki on trunk.

We have an open JIRA for it which you can watch|vote|contribute to:
https://issues.apache.org/jira/browse/JENA-63

Should we chnage the title of JENA-63? It's not about Fuseki, which just supplies the SPARQL protocol and routes requests to the right dataset. It's the dataset that must do the LARQ coordination - initial indexing and incrementally later, across restarts.

Hi Andy,
I am not sure the title of the JENA-63 is going to make much difference.

Users (@ Talis as well) want to easily have SPARQL endpoints and they also
want to easily run free-text searches on those SPARQL endpoints.
Fuseki, currently, provide a very good user experience in terms of quickly
have a SPARQL endpoint, however it does not include free-text search
capabilities.

The patch in JENA-63 does not contain any code change to Fuseki source code,
it only adds LARQ jar (and transitively Lucene v3.1.0) to its dependencies.
All the other necessary code changes have been done already elsewhere (i.e.
ARQ and LARQ).

What would be a more appropriate title?

The overall goal is to make as easy as possible for users to perform free-text
searches over their RDF data if they want to. Notice: this feature is not
"standard" and it is not enabled by default.

Once LARQ is properly released, do you see problems in adding it (and Lucene
v3.1.0) to the Fuseki dependencies?

LARQ is ~46KB.
Lucene v3.1.0 is (unfortunately) much bigger: ~1.2MB.

Is the size of Lucene's jar a concern?

Paolo


It is possible to get Fuseki to automatically run initialization code - the configuration file support ja:loadClass (a bit misnamed - it loads and runs a static) but I don't think that is anything other than a stop-gap.

    Andy


In the meantime, if you want to use LARQ with Fuseki this is what you need to do:

cd /tmp
svn co https://svn.apache.org/repos/asf/incubator/jena/Jena2/Fuseki/trunk/ fuseki
cd /tmp/fuseki
wget https://issues.apache.org/jira/secure/attachment/12482758/JENA-63_Fuseki_r1136050.patch
patch -p0<  JENA-63_Fuseki_r1136050.patch
mvn package

Now, you can simply use the Fuseki config.ttl file as explained here:
http://openjena.org/wiki/Fuseki#Fuseki_Configuration_File
and use the ja:textIndex property on a dataset to specify an non existing directory.

LARQ when you point it at a non existing directory will perform the indexing for you. This is particularly useful when you have multiple datasets configured in Fuseki.
WARNING: it might take a while to index large datasets, so be patient.

See also: http://markmail.org/thread/tmptip55ru5wxrrj

LARQ snapshots are here:
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/larq/0.2.2-incubating-SNAPSHOT/ and I can quickly fix/improve things if you have problems or good suggestions.

I hope this helps, let me know how it goes.

Paolo

Jérôme wrote:
Hi,

i'm trying to use LARQ with my Fuseki server.

I would like to programmaticaly indexing(with lucene) documents when the
server starts.

Something like that:

Model model = ModelFactory.createDefaultModel();
IndexBuilderString larqBuilder = new IndexBuilderString();
model.register(larqBuilder);
FileManager.get().readModel(model, "Data/books.ttl");
larqBuilder.closeWriter();
model.unregister(larqBuilder);
index = larqBuilder.getIndex();
LARQ.setDefaultIndex(index);

Is it possible? In which class it would be the best?

Thanks

Jerome




Reply via email to