Re: [basex-talk] rest vs. restxq - strange difference

Christian Grün Tue, 19 May 2015 03:17:47 -0700

Hi Lars,

I think I can confirm the observed behavior: in certain circumstances,
the index properties (stemming etc.) won't be applied to the optimized
full-text query when using RESTXQ.


I'll check out how this can be fixed.

Thanks,
Christian


On Mon, May 18, 2015 at 6:46 PM, Lars Johnsen <yoon...@gmail.com> wrote:
> A last update, which may illuminate a little. After reindexing the database
> using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ
> processes neither the special characters (treats them as closest ascii), nor
> inflected forms.
>
> The words "mannen" (=the man, definite) and "spaserer" (=walks, present
> tense), result in no output, while using the naked stems "mann" and "spaser"
> the full result is displayed. In contrast to REST which behaves as expected.
>
>
> Cheers
> Lars
>
> 2015-05-18 15:28 GMT+02:00 Lars Johnsen <yoon...@gmail.com>:
>>
>> As an update, after rebuilding database with
>>
>> text index,
>> full text index (no language, no stemming, keep diacritics)
>>
>> restarting server:
>> BaseX 8.1.1 [Server]
>> Server was started (port: 29084)
>> [main] INFO org.eclipse.jetty.server.AbstractConnector - Started
>> SelectChannelConnector@0.0.0.0:8984
>> HTTP Server was started (port: 8984)
>>
>> RESTXQ: Norwegian characters are converted using full text index, changing
>> to text index takes forever.
>> REST: Full-text works as expected, and text index works as expected (same
>> as runing in GUI for both).
>>
>> It looks as if the index structure is treated differently.
>>
>>
>> 2015-05-18 15:07 GMT+02:00 Lars Johnsen <yoon...@gmail.com>:
>>>
>>> The full text query is blisteringly fast for both, the text index query
>>> is fast only for REST queries and seems not to be used with queries in
>>> RESTXQ. I am rebuilding the whole database now to see how it goes, and will
>>> restart everything for a new assessment.
>>>
>>>
>>>
>>> 2015-05-18 15:00 GMT+02:00 Christian Grün <christian.gr...@gmail.com>:
>>>>
>>>> > However, when using text index instead of full text the results are
>>>> > the same
>>>> > for both, except that RESTXQ takes almost forever
>>>>
>>>> What about the original query: Has it been slow as well, or do you
>>>> think this is a new problem?
>>>>
>>>>
>>>> > 2015-05-18 14:28 GMT+02:00 Christian Grün <christian.gr...@gmail.com>:
>>>> >>
>>>> >> It could be that your URL is decoded in a wrong way.. What happens if
>>>> >> you run the following function with REST and RESTXQ and "føre" as
>>>> >> word?
>>>> >>
>>>> >>   declare
>>>> >>     %rest:path("/test/encoding/{$word}")
>>>> >>   function page:test-encoding($word) {
>>>> >>     string-to-codepoints($word)
>>>> >>   };
>>>> >>
>>>> >> Thanks,
>>>> >> Christian
>>>> >>
>>>> >>
>>>> >> string-to-codepoints()
>>>> >> > REST output (2 first lines):
>>>> >> >    føre
>>>> >> >    fø - re 219
>>>> >> >
>>>> >> > RESTXQ
>>>> >> >    føre
>>>> >> >    fo - re 123
>>>> >> >
>>>> >> > The first word quoted is "føre" in both cases and is what the
>>>> >> > scripts
>>>> >> > see,
>>>> >> > so the full text is given the same in both cases. Could it be that
>>>> >> > within
>>>> >> > RESTXQ the full text index is treated differently?
>>>> >> >
>>>> >> > I will work closer on a  self contained example, but thought this
>>>> >> > might
>>>> >> > point to something.
>>>> >> >
>>>> >> > Cheers
>>>> >> > Lars
>>>> >> >
>>>> >> >
>>>> >> > 2015-05-18 13:44 GMT+02:00 Lars Johnsen <yoon...@gmail.com>:
>>>> >> >>
>>>> >> >> Hi Christian - and thanks for fast response. Latest version 8.11
>>>> >> >> is in
>>>> >> >> use
>>>> >> >> (same behaviour as previous). Let me see if I can make a self
>>>> >> >> contained
>>>> >> >> example.
>>>> >> >>
>>>> >> >> best,
>>>> >> >> Lars
>>>> >> >>
>>>> >> >> 2015-05-18 13:40 GMT+02:00 Christian Grün
>>>> >> >> <christian.gr...@gmail.com>:
>>>> >> >>>
>>>> >> >>> Hi Lars,
>>>> >> >>>
>>>> >> >>> hm, that's difficult to tell. All I can say is that this sounds
>>>> >> >>> unusual, so I'm coming up with my standard questions: Do you
>>>> >> >>> think you
>>>> >> >>> could build us a little example that allows us to reproduce the
>>>> >> >>> problem? Have you tried the latest version of BaseX?
>>>> >> >>>
>>>> >> >>> Best,
>>>> >> >>> Christian
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen <yoon...@gmail.com>
>>>> >> >>> wrote:
>>>> >> >>> >
>>>> >> >>> > I am running a web script in two identical versions (identical
>>>> >> >>> > as in
>>>> >> >>> > "cut
>>>> >> >>> > and paste"), one via RESTXQ and one vi REST. The response is
>>>> >> >>> > different,
>>>> >> >>> > and
>>>> >> >>> > I wondered what may be the trouble.
>>>> >> >>> >
>>>> >> >>> > For example the output (the URLs only works locally) for
>>>> >> >>> >     http://ljohnsen:8984/hyphens/mellom
>>>> >> >>> > is the same as
>>>> >> >>> >      http://ljohnsen:8984/rest?run=hyphen-show.xq&word=mellom
>>>> >> >>> >
>>>> >> >>> > which is a set of hyphenation data:
>>>> >> >>> >     mellom
>>>> >> >>> >     mel - lom 17005
>>>> >> >>> >     Mel - lom 144
>>>> >> >>> >     mel - lom. 50
>>>> >> >>> >
>>>> >> >>> > but if "mellom" is exchanged with "nasjonalbiblioteket" only
>>>> >> >>> > the
>>>> >> >>> > REST
>>>> >> >>> > version shows any result, which then is the same as I get
>>>> >> >>> > experimenting
>>>> >> >>> > in
>>>> >> >>> > the GUI.
>>>> >> >>> >
>>>> >> >>> > The actual script is added below, and which runs in both
>>>> >> >>> > versions
>>>> >> >>> > (identical apart form the rest and restxq interfaces), it uses
>>>> >> >>> > full
>>>> >> >>> > text
>>>> >> >>> > search, but results differ when run under the REST-regime.
>>>> >> >>> >
>>>> >> >>> > All the best
>>>> >> >>> > Lars G Johnsen
>>>> >> >>> > National Library of Norway
>>>> >> >>> >
>>>> >> >>> > module namespace page = 'http://basex.org/modules/web-page';
>>>> >> >>> >
>>>> >> >>> > declare
>>>> >> >>> >   %rest:path("/hyphens/{$word}")
>>>> >> >>> >   %output:method("html")
>>>> >> >>> >
>>>> >> >>> > function page:show-hyphens($word) {
>>>> >> >>> >    let $db := db:open('hyphen-data')
>>>> >> >>> >      let $hyphens :=  for $hyp in $db/hyphens/hyphens[full
>>>> >> >>> > contains
>>>> >> >>> > text
>>>> >> >>> > {$word}]
>>>> >> >>> >       group by $first := $hyp/first, $second := $hyp/second
>>>> >> >>> >       let $count := count($hyp)
>>>> >> >>> >       order by xs:int($count) descending
>>>> >> >>> >       return element p {
>>>> >> >>> >         attribute freq {$count},
>>>> >> >>> >         $first, " - ", $second, $count
>>>> >> >>> >       }
>>>> >> >>> >
>>>> >> >>> >      let $total := sum($hyphens//@freq)
>>>> >> >>> >      let $div := element div {
>>>> >> >>> >        element p {$word},
>>>> >> >>> >        for $hyp in $hyphens
>>>> >> >>> >        return element div {
>>>> >> >>> >           attribute class {"hyph"},
>>>> >> >>> >           attribute style {"font-size:", 1
>>>> >> >>> > +round(xs:int($hyp//@freq/data())
>>>> >> >>> > div $total,1) || "em"},
>>>> >> >>> >           $hyp
>>>> >> >>> >
>>>> >> >>> >          }
>>>> >> >>> >      }
>>>> >> >>> >      return
>>>> >> >>> >      <html encoding="UTF-8">
>>>> >> >>> >     <head>
>>>> >> >>> >         <meta http-equiv="Content-Type" content="text/html"
>>>> >> >>> > charset="UTF-8"
>>>> >> >>> > />
>>>> >> >>> >         <title>Orddelinger</title>
>>>> >> >>> >     </head>
>>>> >> >>> >     <body>{$div}
>>>> >> >>> >     </body>
>>>> >> >>> >     </html>
>>>> >> >>> >
>>>> >> >>> > };
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>
>

Re: [basex-talk] rest vs. restxq - strange difference

Reply via email to