GitHub user osma opened a pull request:
https://github.com/apache/jena/pull/81
jena-text stored literals: initial functionality and tests for Lucene
This PR implements a feature where it's possible to store the original
literal values in the jena-text Lucene index and to access them when querying
the index. It works like this:
1) Configure jena-text to store literals (default is off) using the new
`text:storeValues` setting. Note that you also need the `text:langField`
setting in the entity map for language tags and datatypes to be handled
correctly.
```
<#indexLucene> a text:TextIndexLucene ;
#text:directory <file:Lucene> ;
text:directory "mem" ;
text:storeValues true ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:langField "lang" ;
[...]
```
2) Add some data, say this triple:
```
:myresource rdfs:label "My resource"@en .
```
3) Query like this:
```
SELECT * {
(?s ?score ?literal) text:query "resource" .
}
```
In the query result, `?literal` will be bound to `"My resource"@en.`
It also works with typed literals (as requested by @ehedgehog). The
datatype will be stored in the langField using a special prefix (currently
`^^`) which ensures that it cannot be interpreted as a language tag.
There are unit tests for all the basic cases (simple, non-default property,
language tags, datatypes).
I had to change the TextIndex API slightly again, to pass the queried
property from TextQueryPF to TextIndexLucene/TextIndexSolr so that they know
which field to look up values from. Since it was already changed recently, to
return TextHit objects instead of Nodes, I wouldn't expect another change to
hurt.
I've done a basic implementation for Solr as well which doesn't handle the
language tags and datatypes (TextIndexSolr didn't have support for
langField...), but it should be able to return at least the lexical value. I
haven't been able to test this because of lack of documentation for the
jena-text/Solr combination and possibly some bitrot in TextIndexSolr - last
time I tried I couldn't get it working at all.
I can do documentation for this after it has been merged. Now that I have
committer access I could merge this myself, but I'd like to get a couple of
+1's before doing that.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/osma/jena jena-text-literal
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/81.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #81
----
commit 1592c33f21e5337ecfa74706f5a675e6c57f9967
Author: Osma Suominen <[email protected]>
Date: 2015-06-26T06:53:10Z
jena-text stored literals: initial functionality and tests for Lucene
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---