Hi Christian,
Thank you again for all your help. Unfortunately, my documents are
multi-language and multi-diacritics so my users expect it to match
athgabáil, athgabail, and athgabāil as the same word. They also want
wildcard searching to work in the same way. This seems to mean that basex'
full text index would have to be added to or restructured in some way to
make "using diacritic insensitive" with "using wildcard" at the same time.
I cannot think at the moment how to break the two into separate search as
you suggest. Maybe it will come to me later today.
At the moment the query looks like this and it does not use the full text
index:
declare variable $term as xs:string external := 'athgab.*'; declare
variable $col as xs:string external := 'edil';
<results>{subsequence(ft:mark(for $x in collection($col)//entry where
$x//text() contains text {$term} using wildcards using diacritics
insensitive order by fn:lower-case(fn:replace(($x//orth[1]/text())[1],
'\p{P}|\d+','')) collation "?lang=ga" return $x), 1, 5000)}</results>
If anyone has any suggestions, I would be grateful.
All the best,
Chris
On Thu, Aug 14, 2014 at 10:35 PM, Christian Grün <[email protected]>
wrote:
> Hi Chris,
>
> as you already noted, the full-text index
> will
> only
> be
> utilized with
> the
> options that you choose when creating an index. If you want to do more
> fine-grained searches, it’s
> usually
> recommendable to
> choose
> the most general options for creating the index (case insensitive,
> diacritics insensitive, etc). and
> then
> refine the results in a second step.
> This can e.g. look as follows
> :
>
> declare function local:search($db, $terms) {
> for $result in db:open($db)//*[text() contains text { $terms }]
> return $result[text() contains text { $terms } using case sensitive]
> };
> local:search('factbook', ('German', 'English'))
>
> Hope this helps,
> Christian
>
>
>
> On Thu, Aug 14, 2014 at 10:54 PM, Chris Yocum <[email protected]> wrote:
> > Hi Christian,
> >
> > Apologies for bringing this back up but if I use "using diacritics
> > insensitive" in the full text search, it seems to turn full text
> > searching off. I have diacritics true on the database. I am just
> > suprised to see diacritics causing the full text searching to be
> > turned off.
> >
> > All the best,
> > Chris
> >
> > On Wed, Aug 13, 2014 at 01:18:26PM +0200, Christian Grün wrote:
> >> Hi Chris,
> >>
> >> there are various caches involved when evaluating queries, but I can't
> >> see for the given query where a cache may be utilized. However, your
> >> query may be evaluated faster if you simplify the nested where clause:
> >>
> >> <results>{
> >> subsequence(
> >> ft:mark(
> >> for $x in collection($col)//entry
> >> where $x//text() contains text { $term } using wildcards
> >> order by fn:lower-case(
> >> fn:replace(($x//orth[1]/text())[1], '\\p{P}|\\d+','')
> >> ) collation "?lang=ga"
> >> return $x
> >> ), 1, 5000
> >> )
> >> }</results>
> >>
> >> You could as well use a predicate with position(), it may be evaluated
> >> faster than subsequence (I'm not sure, though, because most time will
> >> probably be spent for ordering all results):
> >>
> >> <results>{
> >> ft:mark(
> >> for $x in collection($col)//entry
> >> where $x//text() contains text { $term } using wildcards
> >> order by fn:lower-case(
> >> fn:replace(($x//orth[1]/text())[1], '\\p{P}|\\d+','')
> >> ) collation "?lang=ga"
> >> return $x
> >> )[position() = 1 to 5000]
> >> }</results>
> >>
> >> Could you please open the InfoView in the GUI, execute the query again
> >> and check if the full-text index is applied?
> >>
> >> Christian
> >>
> >>
> >>
> >> On Wed, Aug 13, 2014 at 12:02 PM, Christopher Yocum <[email protected]>
> wrote:
> >> > declare variable $term as xs:string external; declare variable $col as
> >> > xs:string external; <results>{subsequence(ft:mark(for $x in
> >> > collection($col)//entry where $x//text()[. contains text {$term} using
> >> > wildcards] order by fn:lower-case(fn:replace(($x//orth[1]/text())[1],
> >> > '\\p{P}|\\d+','')) collation \"?lang=ga\" return $x), 1,
> 5000)}</results>
>