Re: How to query a text index from the CLI tools? (WAS: Question about indexing in text search)

Bruno P. Kinoshita Sun, 22 Jul 2018 03:38:12 -0700

I think it requires Fuseki or code. Here's my 0.02 cents


But the last time I had issues with Jena Text, first I tried simplifying the 
assembler configuration as much as I could. Up to a point where I had the 
minimum that should work and create the Lucene Index.

Then I would confirm that the Lucene Index was created successfully. This 
confirms just that the file permissions, are OK, and that my configuration is 
at least partially OK too.

Next I would use Fuseki to upload data to my dataset. And to validate this 
step, I used Luke (a Lucene Index GUI) to peek inside the Lucene index and 
confirm my fields were indexed as expected. In case this fails, it would mean 
that the dataset and lucene index location are OK, but there's a problem with 
some analyzer/filter/etc. Normally I would find the error in the logs, or 
attach a debugger and see what was going on in the code. Probably the easiest 
for most users is i) look at logs and then ii) change the configuration a 
little, then try again.

Finally, after I tested that I could see the fields in Luke, and maybe even 
query the index in Luke, I would try my query in Fuseki.


I think the key for users troubleshooting Jena Text configuration, is having 
the minimum configuration possible, and peeking inside the Lucene index to make 
sure it's working as expected for each step of the configure -> load data -> 
create index -> query process.

Hope that helps,
Bruno


________________________________
From: Rob Vesse <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, 20 July 2018 9:29 PM
Subject: FW: How to query a text index from the CLI tools? (WAS: Question about 
indexing in text search)



Text Indexing folks,


I am running into the edge of my knowledge on this users thread.  Is there a 
command line tool that can query a text index based on the corresponding 
assembler file or is this something that requires using Fuseki/code?


Rob


On 19/07/2018, 18:00, "Alysson Gomes" <[email protected]> wrote:


    I did, but as you told the sparql launched an exception:

    org.apache.jena.sparql.ARQException: Found two matches: var ?root ->

    http://localhost/jena_example/#text_dataset,

    
file:///home/alysson/Documents/PUC-Rio/TestJena/tdb-citation-data-en-fuseki-index/index.ttl#dataset

    

    

    Em qui, 19 de jul de 2018 às 13:13, Rob Vesse <[email protected]>

    escreveu:

    

    > No I can't, the documentation is doing the right thing. A text dataset is

    > fundamentally a wrapper around another dataset so any text indexing config

    > will always require at least two datasets in the configuration file.

    >

    > Did you try using the sparql tool instead as I suggested?

    >

    > Rob

    >

    > On 19/07/2018, 15:25, "Alysson Gomes" <[email protected]> wrote:

    >

    >     Are do you can send an example of a configuration file with only one

    >     dataset that contains the index? Because I'm based me in the examples

    > of

    >     the documentation (is must similar to the configuration that I'm

    > using).

    >

    >     Em qui, 19 de jul de 2018 às 10:04, Rob Vesse <[email protected]>

    >     escreveu:

    >

    >     > Thanks, so your problem was as I suspected

    >     >

    >     > You use tdbquery which does not understand text indexes using it as

    > you

    >     > do.  By using --loc you are only querying your base dataset, this

    > does not

    >     > include your text index so you don't get any results.

    >     >

    >     > I would try using the base sparql tool instead passing in your

    >     > configuration file i.e.

    >     >

    >     > sparql --desc=index.ttl --query=queries.rq

    >     >

    >     > I am not 100% sure this will work because there are two datasets

    > defined

    >     > in your config file (the base dataset and the text indexed dataset)

    > and I

    >     > am not sure which one the sparql tool will pick by default

    >     >

    >     > Rob

    >     >

    >     >

    >     >

    >     >

    >     > On 19/07/2018, 13:47, "Alysson Gomes" <[email protected]>

    > wrote:

    >     >

    >     >     I'm using the file bin of the Jena:

    >     >     *tdbquery

    >     >

    >  --loc=/home/alysson/Documents/PUC-Rio/TestJena/tdb-citation-data-en

    >     >     --query=queries.rq*

    >     >

    >     >     file *queries.rq*:

    >     >     *prefix text: <http://jena.apache.org/text#

    >     >     <http://jena.apache.org/text#>>select ?s ?owhere{    ?s

    > text:query(

    >     >     <http://dbpedia.org/property/first <

    > http://dbpedia.org/property/first

    >     > >>

    >     >     "David") ;    <http://dbpedia.org/property/first

    >     >     <http://dbpedia.org/property/first>> ?o}*

    >     >

    >     >     Em qui, 19 de jul de 2018 às 05:56, Rob Vesse <

    > [email protected]>

    >     >     escreveu:

    >     >

    >     >     > You still didn’t state how you execute the query, you included

    >     > commands

    >     >     > for creating the database and index but not the command/code

    > that

    >     > actually

    >     >     > makes the query

    >     >     >

    >     >     >

    >     >     >

    >     >     > Please show exactly how you are submitting your query

    >     >     >

    >     >     >

    >     >     >

    >     >     > Rob

    >     >     >

    >     >     >

    >     >     >

    >     >     > From: Alysson Gomes <[email protected]>

    >     >     > Reply-To: <[email protected]>

    >     >     > Date: Wednesday, 18 July 2018 at 20:15

    >     >     > To: <[email protected]>

    >     >     > Subject: Re: Question about indexing in text search

    >     >     >

    >     >     >

    >     >     >

    >     >     > Are using the following commands:

    >     >     >

    >     >     >

    >     >     >

    >     >     > Loading dataset

    >     >     >

    >     >     > $JENAROOT/bin/tdbloader

    >     >     >

    > -loc=/home/alysson/Documents/PUC-Rio/TestJena/tdb2-citation-data-en

    >     >     > tdb_citation.ttl

    >     >     >

    >     >     >

    >     >     >

    >     >     > Create index:

    >     >     >

    >     >     > java -cp

    >     >     >

    >     >

    > 
/home/alysson/MEGA/Computação/ApacheJena/apache-jena-fuseki-3.8.0/fuseki-server.jar

    >     >     > jena.textindexer --desc=index.ttl

    >     >     >

    >     >     >

    >     >     >

    >     >     > While the command above is running appear the following 
result:

    >     >     >

    >     >     >

    >     >     >

    >     >     > After the creation of the index, I execute the query:

    >     >     >

    >     >     >

    >     >     >

    >     >     > prefix text: <http://jena.apache.org/text#>

    >     >     >

    >     >     > select ?s ?o

    >     >     >

    >     >     > where{

    >     >     >

    >     >     >     ?s text:query( <http://dbpedia.org/property/first>

    > "David") ;

    >     >     >

    >     >     >     <http://dbpedia.org/property/first> ?o

    >     >     >

    >     >     > }

    >     >     >

    >     >     >

    >     >     >

    >     >     > These are all commands that I'm using.

    >     >     >

    >     >     >

    >     >     >

    >     >     > Em qua, 18 de jul de 2018 às 13:13, Rob Vesse <

    > [email protected]>

    >     >     > escreveu:

    >     >     >

    >     >     > There is nothing obviously wrong with your configuration.  You

    > still

    >     >     > haven’t shown the code that you are using with this

    > configuration to

    >     > make

    >     >     > your query.

    >     >     >

    >     >     >

    >     >     >

    >     >     > My guess would be that perhaps your code is loading in the 
base

    >     > dataset

    >     >     > without the indexing support i.e. you may be querying the base

    >     > dataset

    >     >     > rather than the text dataset, but without having seen your 
code

    >     > that’s only

    >     >     > a guess.

    >     >     >

    >     >     >

    >     >     >

    >     >     > Rob

    >     >     >

    >     >     >

    >     >     >

    >     >     > From: Alysson Gomes <[email protected]>

    >     >     > Reply-To: <[email protected]>

    >     >     > Date: Wednesday, 18 July 2018 at 14:55

    >     >     > To: <[email protected]>

    >     >     > Subject: Re: Question about indexing in text search

    >     >     >

    >     >     >

    >     >     >

    >     >     > Hi Rob!

    >     >     >

    >     >     > I attached the file with the code of the text index (file

    > index.ttl)

    >     > but

    >     >     > to facility it, follow the image:

    >     >     >

    >     >     >

    >     >     >

    >     >     > Error! Filename not specified.

    >     >     >

    >     >     > I'm using the same queries of the previous mail. Case has

    > something

    >     > wrong,

    >     >     > please indicate it some solution.

    >     >     >

    >     >     >

    >     >     >

    >     >     > Em qua, 18 de jul de 2018 às 10:12, Rob Vesse <

    > [email protected]>

    >     >     > escreveu:

    >     >     >

    >     >     > This is a misunderstanding, not a bug.  Property functions use

    > the

    >     > SPARQL

    >     >     > collection syntax i.e. ( <http://dbpedia.org/property/first>

    >     > “David”) to

    >     >     > pass arguments to the function which is given as the

    > predicate, in

    >     > this

    >     >     > case text:query. The rdf:first/rdf:rest you see in the logs is

    >     > simply the

    >     >     > expansion of that into triple patterns which later gets

    > extracted

    >     > out into

    >     >     > the actual property function call.  The fact that those happen

    > to be

    >     >     > similar to the property you’re are trying to search on is

    > purely

    >     >     > coincidental.

    >     >     >

    >     >     >

    >     >     >

    >     >     > If your query is not working as expected then the actual

    > problem is

    >     >     > elsewhere, likely in the configuration of your text index.  So

    > you

    >     > would

    >     >     > need to share that configuration and show how you actually

    > execute

    >     > your

    >     >     > query if you want further help with this.

    >     >     >

    >     >     >

    >     >     >

    >     >     > Regards,

    >     >     >

    >     >     >

    >     >     > Rob

    >     >     >

    >     >     >

    >     >     >

    >     >     > From: Alysson Gomes <[email protected]>

    >     >     > Reply-To: <[email protected]>

    >     >     > Date: Wednesday, 18 July 2018 at 13:42

    >     >     > To: "[email protected]" <[email protected]>

    >     >     > Subject: Question about indexing in text search

    >     >     >

    >     >     > Hello, my name is Alysson, I am a master's student in the

    > Pontifical

    >     >     > Catholic University of Rio de Janeiro and am having problems

    > with the

    >     >     > indexing in text search.

    >     >     >

    >     >     > In the attach 1 contains the assembler that I'm using for to

    > index

    >     > the

    >     >     > triples that contain the predicate <

    >     > http://dbpedia.org/property/first>.

    >     >     >

    >     >     > My goal is to reproduce the query [1] using an index, but the

    >     > problem is

    >     >     > that when I execute the query [2] the URI used by the query

    >     > processor is

    >     >     > different of the URI that I am using in the predicate, as show

    > image

    >     > below:

    >     >     >

    >     >     >

    >     >     > Error! Filename not specified.

    >     >     >

    >     >     > As show in the image above, the query processor uses the URI <

    >     >     > http://www.w3.org/1999/02/22-rdf-syntax-ns> generating a

    > result

    >     >     > incorrect.

    >     >     >

    >     >     > I want to know if it is possible to change this or if I am

    > doing some

    >     >     > wrong.

    >     >     >

    >     >     > Since I thank you for the help.

    >     >     >

    >     >     >

    >     >     >

    >     >     >

    >     >     >

    >     >     > [1]: Query

    >     >     >

    >     >     > SELECT ?s ?o

    >     >     >

    >     >     > WHERE {

    >     >     >

    >     >     > ?s <http://dbpedia.org/property/first> ?o

    >     >     >

    >     >     > filter regex(?o, "David", "i")

    >     >     >

    >     >     > }

    >     >     >

    >     >     >

    >     >     >

    >     >     > [2]: Query

    >     >     >

    >     >     > PREFIX text: <http://jena.apache.org/text#>

    >     >     >

    >     >     > SELECT ?s ?o

    >     >     >

    >     >     > WHERE {

    >     >     >

    >     >     > ?s text:query( <http://dbpedia.org/property/first> "David") ;

    >     >     >

    >     >     > <http://dbpedia.org/property/first> ?o

    >     >     >

    >     >     > }

    >     >     >

    >     >     >

    >     >

    >     >

    >     >

    >     >

    >     >

    >     >

    >

    >

    >

    >

    >

    >

Re: How to query a text index from the CLI tools? (WAS: Question about indexing in text search)

Reply via email to