Re: jena-text limit by named graph (and language?)

Andy Seaborne Mon, 09 Dec 2013 03:58:08 -0800

Looks very good.

I've applied the patch jena-text-graph-with-tests.diff to the code baseand committed it to svn trunk.

I added the patch to the JIRA for the record. BTW - you can add filesdirectly. JIRA is not restricted to project committers.


One matter arising:
jena.textindexer

If I do:
rm -rf Lucene/* ; rm -f DB/* ; tdbloader --loc DB D.trig
textindexer --desc config-tdb-text.ttl

I get:
-------------
value cannot be null
-------------

Something has eaten the stacktrace.

I can't find that message in the code base so it might be coming from ajava library.


No indexing started.

D.trig:
---------------
prefix : <http://example/>
prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>

:s rdfs:label "This is :s" .
---------------

Commenting out
    text:graphField       "graph" ;
and then no error so it looks to be related to that field.

More inline:

On 09/12/13 10:36, Osma Suominen wrote:

Hi Andy!

08.12.2013 23:09, Andy Seaborne kirjoitti:

That there was material under src/test/java at all.  Not all patches
have tests :-(


Right. That was a bit premature though, as in that version I only
tweaked existing tests so they wouldn't break.

However, the attached patch also includes real unit tests for the
graph-aware text index. I grouped the new test code into abstract
classes following the same pattern as existing tests (it took a while to
wrap my brain around the layers of abstract classes!):

* AbstractTestDatasetWithGraphTextIndex (subclass of
AbstractTestDatasetWithTextIndex) contains four working (and one
non-working, see below) unit tests that test the graph-specific
functionality.

* AbstractTestDatasetWithLuceneGraphTextIndex (ugh! subclass of above)
contains the in-memory TDB+Lucene setup required to run the tests.

* Finally, TestDatasetWithLuceneGraphTextIndex is a concrete class
subclassing the above, making sure the tests are actually run.

Thanks to the subclassing, all the existing text index tests (from
AbstractTestDatasetWithTextIndex) are also successfully executed on the
graph-aware index.

In this version I also reworked the constructor changes in
EntityDefinition. Now all the old constructors are preserved (3 old + 2
new) so there is no urgent need to adjust old tests, example code or
documentation.


Great!

1/ Blank node graphs - how about using the pseudo URI _:label rather
than use g.getBlankNodeLabel()?


So you mean
   String graph = (g.isURI() ) ? g.getURI() : "_:" +
g.getBlankNodeLabel() ;
instead of the current
   String graph = (g.isURI() ) ? g.getURI() : g.getBlankNodeLabel() ;
or did I misunderstand?


yes - that should do it if it isn't in the code anywhere else as well.


I implemented it this way. However, I couldn't find a way to really test
bnode-labeled graphs. The last unit test I wrote in
AbstractTestDatasetWithGraphTextIndex tries to parse a TriG snippet and
test with that, but executing the test fails with
"com.hp.hpl.jena.sparql.ARQInternalErrorException:
QueryIterGraphInner.buildIterator".


JENA-609

(OK - it's not big thing but I don't want to loose track of anything -get the text indexing improvements done then go back and fix incidentaldiscoveries)

I gave up and commented out the test

@Ignore is a lillte better - code compiles, and it gets noted in theJUnit output (e.g. Eclipse) if anyone is looking.

method. I may well have made some stupid beginner's mistake here, I'm
not very familiar with using the Jena Java API.

This is what happens when features get retro fitted :-( The code canhave assumptions


SPARQL Update does not accept bNodes either.


(Sidenote: I think the current jena-text index code also won't
gracefully handle resources with a bnode subject. The
getBlankNodeLabel() result gets stored in the index and is then
considered a URI at query time, though it isn't really and probably
won't match the original triples in the store.)

Upload a TriG file with bnode-labeled graphs.


I tried this from the Fuseki web UI (using attached TriG file) and got
this:

10:10:22 INFO  [2] POST http://localhost:3030/ds/upload
10:10:22 INFO  [2] Upload: Filename: blanknodegraphs.trig,
Content-Type=application/octet-stream, Charset=null => TriG
10:10:22 WARN  Only triples or default graph data expected : named graph
data ignored
10:10:22 INFO  [2] Upload: Graph: default (2 triple(s))
10:10:22 INFO  [2] 200 OK (40 ms)

So only the two default graph triples were actually stored.

Something else to fix. You're doing nothing wrong here. Fuseki needsto be quad sensitive. Generally, reading quads in a graph context getsyou the default graph and everything else ignored. Convenient - butit's bitten here.


(I have a nasty feeling this used to work and now doesn't.  Hmm.)

I've raised JENA-607.

The java code is behind the curve - Graph/Node level works, the
Dataset/Resource API does not allow the creation of bNode labeled graphs.


As mentioned above I tried to parse a TriG snippet via Java code in the
unit test. The parsing seemed to work (at least there was no error) but
querying failed.

2/ Did I get it right that the default graph is
Quad.defaultGraphNodeGenerated?  Maybe


In my tests the default graph seems to have the URI
"urn:x-arq:DefaultGraph", so it's probably this one from Quad.java:
     public static final Node defaultGraphIRI        =
NodeFactory.createURI("urn:x-arq:DefaultGraph") ;

Since it's just another URI to the index, indexing works fine here as
well. Though it would make sense to add a unit test for indexing the
default graph just in case.


Oops, sorry, you were right. It's actually defaultGraphNodeGenerated and
it is now handled correctly both at indexing and query time. And there's
a unit test to make sure :)

(I also tried enabling TDB unionDefaultGraph mode but that broke some of
the existing jena-text tests...)

How much of the documentation needs to change?  Just another section?


Basically it's just another section for the text-query.html page. Also
the the Configuration by Code section currently shows how to use
EntityDefinition, it needs to be updated with the new constructor
argument.

Where is the documentation kept? Do you take documentation patches as
well or what is the preferred way of contributing to the docs?


It's in SVN

http://svn.apache.org/repos/asf/jena/site/trunk/content/documentation/query/text-query.mdtext


OK, I'll look at documenting this next, if the code looks good to you.


Yes it does.

Code committed.


-Osma


        Andy

Re: jena-text limit by named graph (and language?)

Reply via email to