Hi Ron,

You can add an extra element (or attribute) to the content when importing or modifying it. (Or another document in another database if you like – you can create and later find such an index document by giving it the same db:path as the original document.)

In this extra database, document, element and/or attribute, you can recreate the original text, except that you normalize the characters with diacritical marks to a canonical decomposition form and then strip away the diacritical marks like this:

replace(normalize-unicode($input, 'NFKD'), '\p{Mn}', '')

The full updating statement is beyond my cursory XQuery capabilities – I’d probably do it in XSLT. Also I don’t know how to trigger an event that would cause an update of the auxiliary fields when the underlying data changes.

Gerrit


On 03.08.2018 14:39, Ron Katriel wrote:
Christian,

Adding diacritics sensitive slows execution by a factor of 3. My script (fragment below), which joins two large databases, namely CT.gov <http://clinicaltrials.gov> and DrugBank, takes 2 hours without the diacritics sensitive constraint but 6 hours with it. Given the combinatorics involved, I am wondering if there is a better way to do this in BaseX.

Thanks,
Ron


for $drug in db:open('DrugBank')/drugbank/drug
  let $drug_name := $drug/name/text()
 let $drug_synonyms := functx:value-union(normalize-space(lower-case($drug/name)), local:drug-synonyms($drug_name))
  for $synonym_name in $drug_synonyms
  ...
 for $study in db:open('CTGov')/clinical_study[intervention/intervention_name contains text { $synonym_name } using case insensitive using diacritics sensitive]
  ...


Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions <http://www.mdsol.com/>
350 Hudson Street, 7th Floor, New York, NY 10014
rkatr...@mdsol.com <mailto:tbro...@mdsol.com> | direct: +1 201 337 3622 <tel://201%20337%203622> | mobile: +1 201 675 5598 <tel://+1%20201%20675%205598> | main: +1 212 918 1800 <tel://+1%20212%20918%201800>

On August 1, 2018 at 12:41:26 PM, Ron Katriel (rkatr...@mdsol.com <mailto:rkatr...@mdsol.com>) wrote:

Thanks, Christian. Strange, prior to contacting you and on a hunch, I tried adding the missing “using” keyword but still got the syntax error. Anyway, everything is good now!

Best,
Ron

On August 1, 2018 at 3:57:51 AM, Christian Grün (christian.gr...@gmail.com <mailto:christian.gr...@gmail.com>) wrote:

I have fixed the example in the doc.
Best, Christian


On Wed, Aug 1, 2018 at 5:08 AM Ron Katriel <rkatr...@mdsol.com <mailto:rkatr...@mdsol.com>> wrote:
>
> Hi,
>
> The following from your website (docs.basex.org/wiki/Full-Text <http://docs.basex.org/wiki/Full-Text>) appears to be syntactically incorrect
>
> "'Äpfel' will not be found..." contains text "Apfel" diacritics sensitive
>
> In the BaseX GUI the keyword diacritics is underlined in red and the 
following error is reported
>
> Unexpected end of query: 'diacritic sens...'.
>
> This happens in version 8.6.4 and also the latest (9.0.2).
>
> Thanks,
> Ron
>
>
> Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions
>
> 350 Hudson Street, 7th Floor, New York, NY 10014
>
> rkatr...@mdsol.com <mailto:rkatr...@mdsol.com> | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800
>
>

Reply via email to