FW: How to query a text index from the CLI tools? (WAS: Question about indexing in text search)

Rob Vesse Fri, 20 Jul 2018 02:29:24 -0700

Text Indexing folks,

I am running into the edge of my knowledge on this users thread.  Is there a 
command line tool that can query a text index based on the corresponding 
assembler file or is this something that requires using Fuseki/code?


Rob

On 19/07/2018, 18:00, "Alysson Gomes" <[email protected]> wrote:

    I did, but as you told the sparql launched an exception:
    org.apache.jena.sparql.ARQException: Found two matches: var ?root ->
    http://localhost/jena_example/#text_dataset,
    
file:///home/alysson/Documents/PUC-Rio/TestJena/tdb-citation-data-en-fuseki-index/index.ttl#dataset
    
    
    Em qui, 19 de jul de 2018 às 13:13, Rob Vesse <[email protected]>
    escreveu:
    
    > No I can't, the documentation is doing the right thing. A text dataset is
    > fundamentally a wrapper around another dataset so any text indexing config
    > will always require at least two datasets in the configuration file.
    >
    > Did you try using the sparql tool instead as I suggested?
    >
    > Rob
    >
    > On 19/07/2018, 15:25, "Alysson Gomes" <[email protected]> wrote:
    >
    >     Are do you can send an example of a configuration file with only one
    >     dataset that contains the index? Because I'm based me in the examples
    > of
    >     the documentation (is must similar to the configuration that I'm
    > using).
    >
    >     Em qui, 19 de jul de 2018 às 10:04, Rob Vesse <[email protected]>
    >     escreveu:
    >
    >     > Thanks, so your problem was as I suspected
    >     >
    >     > You use tdbquery which does not understand text indexes using it as
    > you
    >     > do.  By using --loc you are only querying your base dataset, this
    > does not
    >     > include your text index so you don't get any results.
    >     >
    >     > I would try using the base sparql tool instead passing in your
    >     > configuration file i.e.
    >     >
    >     > sparql --desc=index.ttl --query=queries.rq
    >     >
    >     > I am not 100% sure this will work because there are two datasets
    > defined
    >     > in your config file (the base dataset and the text indexed dataset)
    > and I
    >     > am not sure which one the sparql tool will pick by default
    >     >
    >     > Rob
    >     >
    >     >
    >     >
    >     >
    >     > On 19/07/2018, 13:47, "Alysson Gomes" <[email protected]>
    > wrote:
    >     >
    >     >     I'm using the file bin of the Jena:
    >     >     *tdbquery
    >     >
    >  --loc=/home/alysson/Documents/PUC-Rio/TestJena/tdb-citation-data-en
    >     >     --query=queries.rq*
    >     >
    >     >     file *queries.rq*:
    >     >     *prefix text: <http://jena.apache.org/text#
    >     >     <http://jena.apache.org/text#>>select ?s ?owhere{    ?s
    > text:query(
    >     >     <http://dbpedia.org/property/first <
    > http://dbpedia.org/property/first
    >     > >>
    >     >     "David") ;    <http://dbpedia.org/property/first
    >     >     <http://dbpedia.org/property/first>> ?o}*
    >     >
    >     >     Em qui, 19 de jul de 2018 às 05:56, Rob Vesse <
    > [email protected]>
    >     >     escreveu:
    >     >
    >     >     > You still didn’t state how you execute the query, you included
    >     > commands
    >     >     > for creating the database and index but not the command/code
    > that
    >     > actually
    >     >     > makes the query
    >     >     >
    >     >     >
    >     >     >
    >     >     > Please show exactly how you are submitting your query
    >     >     >
    >     >     >
    >     >     >
    >     >     > Rob
    >     >     >
    >     >     >
    >     >     >
    >     >     > From: Alysson Gomes <[email protected]>
    >     >     > Reply-To: <[email protected]>
    >     >     > Date: Wednesday, 18 July 2018 at 20:15
    >     >     > To: <[email protected]>
    >     >     > Subject: Re: Question about indexing in text search
    >     >     >
    >     >     >
    >     >     >
    >     >     > Are using the following commands:
    >     >     >
    >     >     >
    >     >     >
    >     >     > Loading dataset
    >     >     >
    >     >     > $JENAROOT/bin/tdbloader
    >     >     >
    > -loc=/home/alysson/Documents/PUC-Rio/TestJena/tdb2-citation-data-en
    >     >     > tdb_citation.ttl
    >     >     >
    >     >     >
    >     >     >
    >     >     > Create index:
    >     >     >
    >     >     > java -cp
    >     >     >
    >     >
    > 
/home/alysson/MEGA/Computação/ApacheJena/apache-jena-fuseki-3.8.0/fuseki-server.jar
    >     >     > jena.textindexer --desc=index.ttl
    >     >     >
    >     >     >
    >     >     >
    >     >     > While the command above is running appear the following 
result:
    >     >     >
    >     >     >
    >     >     >
    >     >     > After the creation of the index, I execute the query:
    >     >     >
    >     >     >
    >     >     >
    >     >     > prefix text: <http://jena.apache.org/text#>
    >     >     >
    >     >     > select ?s ?o
    >     >     >
    >     >     > where{
    >     >     >
    >     >     >     ?s text:query( <http://dbpedia.org/property/first>
    > "David") ;
    >     >     >
    >     >     >     <http://dbpedia.org/property/first> ?o
    >     >     >
    >     >     > }
    >     >     >
    >     >     >
    >     >     >
    >     >     > These are all commands that I'm using.
    >     >     >
    >     >     >
    >     >     >
    >     >     > Em qua, 18 de jul de 2018 às 13:13, Rob Vesse <
    > [email protected]>
    >     >     > escreveu:
    >     >     >
    >     >     > There is nothing obviously wrong with your configuration.  You
    > still
    >     >     > haven’t shown the code that you are using with this
    > configuration to
    >     > make
    >     >     > your query.
    >     >     >
    >     >     >
    >     >     >
    >     >     > My guess would be that perhaps your code is loading in the 
base
    >     > dataset
    >     >     > without the indexing support i.e. you may be querying the base
    >     > dataset
    >     >     > rather than the text dataset, but without having seen your 
code
    >     > that’s only
    >     >     > a guess.
    >     >     >
    >     >     >
    >     >     >
    >     >     > Rob
    >     >     >
    >     >     >
    >     >     >
    >     >     > From: Alysson Gomes <[email protected]>
    >     >     > Reply-To: <[email protected]>
    >     >     > Date: Wednesday, 18 July 2018 at 14:55
    >     >     > To: <[email protected]>
    >     >     > Subject: Re: Question about indexing in text search
    >     >     >
    >     >     >
    >     >     >
    >     >     > Hi Rob!
    >     >     >
    >     >     > I attached the file with the code of the text index (file
    > index.ttl)
    >     > but
    >     >     > to facility it, follow the image:
    >     >     >
    >     >     >
    >     >     >
    >     >     > Error! Filename not specified.
    >     >     >
    >     >     > I'm using the same queries of the previous mail. Case has
    > something
    >     > wrong,
    >     >     > please indicate it some solution.
    >     >     >
    >     >     >
    >     >     >
    >     >     > Em qua, 18 de jul de 2018 às 10:12, Rob Vesse <
    > [email protected]>
    >     >     > escreveu:
    >     >     >
    >     >     > This is a misunderstanding, not a bug.  Property functions use
    > the
    >     > SPARQL
    >     >     > collection syntax i.e. ( <http://dbpedia.org/property/first>
    >     > “David”) to
    >     >     > pass arguments to the function which is given as the
    > predicate, in
    >     > this
    >     >     > case text:query. The rdf:first/rdf:rest you see in the logs is
    >     > simply the
    >     >     > expansion of that into triple patterns which later gets
    > extracted
    >     > out into
    >     >     > the actual property function call.  The fact that those happen
    > to be
    >     >     > similar to the property you’re are trying to search on is
    > purely
    >     >     > coincidental.
    >     >     >
    >     >     >
    >     >     >
    >     >     > If your query is not working as expected then the actual
    > problem is
    >     >     > elsewhere, likely in the configuration of your text index.  So
    > you
    >     > would
    >     >     > need to share that configuration and show how you actually
    > execute
    >     > your
    >     >     > query if you want further help with this.
    >     >     >
    >     >     >
    >     >     >
    >     >     > Regards,
    >     >     >
    >     >     >
    >     >     > Rob
    >     >     >
    >     >     >
    >     >     >
    >     >     > From: Alysson Gomes <[email protected]>
    >     >     > Reply-To: <[email protected]>
    >     >     > Date: Wednesday, 18 July 2018 at 13:42
    >     >     > To: "[email protected]" <[email protected]>
    >     >     > Subject: Question about indexing in text search
    >     >     >
    >     >     > Hello, my name is Alysson, I am a master's student in the
    > Pontifical
    >     >     > Catholic University of Rio de Janeiro and am having problems
    > with the
    >     >     > indexing in text search.
    >     >     >
    >     >     > In the attach 1 contains the assembler that I'm using for to
    > index
    >     > the
    >     >     > triples that contain the predicate <
    >     > http://dbpedia.org/property/first>.
    >     >     >
    >     >     > My goal is to reproduce the query [1] using an index, but the
    >     > problem is
    >     >     > that when I execute the query [2] the URI used by the query
    >     > processor is
    >     >     > different of the URI that I am using in the predicate, as show
    > image
    >     > below:
    >     >     >
    >     >     >
    >     >     > Error! Filename not specified.
    >     >     >
    >     >     > As show in the image above, the query processor uses the URI <
    >     >     > http://www.w3.org/1999/02/22-rdf-syntax-ns> generating a
    > result
    >     >     > incorrect.
    >     >     >
    >     >     > I want to know if it is possible to change this or if I am
    > doing some
    >     >     > wrong.
    >     >     >
    >     >     > Since I thank you for the help.
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     > [1]: Query
    >     >     >
    >     >     > SELECT ?s ?o
    >     >     >
    >     >     > WHERE {
    >     >     >
    >     >     > ?s <http://dbpedia.org/property/first> ?o
    >     >     >
    >     >     > filter regex(?o, "David", "i")
    >     >     >
    >     >     > }
    >     >     >
    >     >     >
    >     >     >
    >     >     > [2]: Query
    >     >     >
    >     >     > PREFIX text: <http://jena.apache.org/text#>
    >     >     >
    >     >     > SELECT ?s ?o
    >     >     >
    >     >     > WHERE {
    >     >     >
    >     >     > ?s text:query( <http://dbpedia.org/property/first> "David") ;
    >     >     >
    >     >     > <http://dbpedia.org/property/first> ?o
    >     >     >
    >     >     > }
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >
    >
    >
    >
    >
    >
    >

FW: How to query a text index from the CLI tools? (WAS: Question about indexing in text search)

Reply via email to