The following XQuery run in the GUI (pulled from github and built a few
minutes ago from source)
ft:tokens('testdata'),
ft:search('testdata', 'r.ḥ', map {'wildcards': true()})/.., '----------'
,collection('testdata')//*[text() contains text 'r.ḥ' using wildcards]
yields
<entry count="4">rwḥ</entry>
----------
with collection('testdata')
<_>
<gram xmlns="http://www.tei-c.org/ns/1.0" type="root"
xml:lang="ar-aeb-x-vicav">rwḥ</gram>
<gram xmlns="http://www.tei-c.org/ns/1.0" type="root"
xml:lang="ar-aeb-x-vicav">rwḥ</gram>
<gram xmlns="http://www.tei-c.org/ns/1.0" type="root"
xml:lang="ar-aeb-x-vicav">rwḥ</gram>
<gram xmlns="http://www.tei-c.org/ns/1.0" type="root"
xml:lang="ar-aeb-x-tunis-vicav">rwḥ</gram>
</_>
But the gh1800() test changed like this:
final String text ="999 aa 1111 rwḥ";
[...]
query("ft:search('" +NAME +"', 'r.ḥ', " + options +")", text);
works.
Am 06.02.2020 um 13:45 schrieb Christian Grün:
I just tried to use the gh1800 test to replicate my problem and it does
not show there. It fails using the GUI.
I need your help: What does not show there? What fails, what happens?
Am 06.02.2020 um 13:35 schrieb Christian Grün:
Hi Omar,
Yes, that seems to solve the problem partly. Using wildcards now yields the
same result as no wildcards.
Glad to hear.
But if there is a complex unicode character in the search string, "." for one
character looses its meaning.
…
Would you like a PR for the test gh1800 using complex unicode characters?
A little test case would be helpful indeed. It seems to be a different issue:
• The first expression is evaluated without the full-text expression.
The reason is that the full-text index algorithms are limited to basic
regular expressions; not all of them can be answered by an index (and
'r{1,1}' is currently not detected as being identical to `r.`). If I
remember correctly, the index will not be accessed either if a pattern
starts with `.*` (this pattern would lead to a full index scan).
• The second expression is rewritten for index access. I tried to
build a little command script (test.bxs), but it doesn’t seem to
reflect the case you encountered:
set ftindex true
create db test <xml>rwḥ</xml>
xquery /*[text() contains text 'r.{1,1}ḥ' using wildcards]
xquery /*[text() contains text 'r.ḥ' using wildcards]
close
Could you extend this example script a little, such that it
demonstrates what goes wrong?
Thanks in advance,
Christian