Hi all,

I would like to propose a new feature for jena-text, making it possible to store the original literals in the Lucene index for fast retrieval. I've talked about this before, but at that point it was difficult to implement. With the recent jena-text work by Alexis Miara and myself, I think this would now be feasible to implement with relatively little effort.

It would work like this:

1. Configure jena-text to store literals (default would be off):

<#entMap> a text:EntityMap ;
    text:entityField "uri" ;
    text:langField "lang" ;
    text:storeValues true ;
[...]


2. Add some data, say this triple:

:myresource rdfs:label "My resource"@en .


3. Query like this:

SELECT * {
  (?s ?score ?literal) text:query "resource" .
}

In the query result, ?literal would be bound to "My resource"@en.


In practice, the literal value would be stored using the Lucene facility to store the original field value alongside the indexed value (TextField.TYPE_STORED). This would be similar to how LARQ worked. If the langField setting was in use, the language field would hold the language tag as well. If not, the returned literals would not have a language tag (in the above example, the value would be "My resource").


The benefit would be that there would be no need to hunt for the original matching value in the RDF data. This would simplify, and probably speed up, many of the SPARQL queries that I use in the Skosmos application.

I already have some preliminary code and tests to implement this, but they are not yet ready for public review. I can make a pull request later on when I have something to show.

-Osma



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Reply via email to