Christian, Thanks for sharing that. I assumed all along that this happens automatically. Anyway, I ran my query (for one drug, to save time) and see the following in the Info view
- apply text index for "Lenalidomide" I believe the slow execution may be due to a combinatorial issue: the cross product of 280,000 clinical trials and ~10,000 drugs in DrugBank (not counting synonyms). I am considering an algorithmic solution that involves storing the DrugBank information in a hash table (map) and looking it up while iterating through the CT.gov <http://clinicaltrials.gov> trials. Best, Ron On August 3, 2018 at 5:49:30 PM, Christian Grün ([email protected]) wrote: Our documentation should help you here: http://docs.basex.org/wiki/Indexes <https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Indexes&d=DwMFaQ&c=fi2D4-9xMzmjyjREwHYlAw&r=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE&m=mk1COTV1sAZu82fBqU9P70ZPQXi-d6NrV1-5QYTPHOo&s=Esza6Q3FyaDERIFJTWBAjifLIDVFW3bWKMLS4hbqv_A&e=> Ron Katriel <[email protected]> schrieb am Fr., 3. Aug. 2018, 23:20: > Hi Christian, > > Yes, I created a full-text index when the databases where loaded (see the > commands below). I also verified that FTINDEX is true for both databases > (in the GUI under Database > Open & Manage). > > How do I ensure that my query is rewritten for index access? > > Thanks, > Ron > > > SET FTINDEX true; SET TOKENINDEX true; CREATE DB CTGov "/Data Sets/ > ct.gov/xml > <https://urldefense.proofpoint.com/v2/url?u=http-3A__ct.gov_xml&d=DwMFaQ&c=fi2D4-9xMzmjyjREwHYlAw&r=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE&m=mk1COTV1sAZu82fBqU9P70ZPQXi-d6NrV1-5QYTPHOo&s=nDUqSutsQr7QyD8E6-XysRp1qudWO6I05tJaWjkCUI4&e=> > " > SET FTINDEX true; SET TOKENINDEX true; SET STRIPNS true; CREATE DB > DrugBank “/Data Sets/DrugBank/drugbank.xml" > > On August 3, 2018 at 4:12:43 PM, Christian Grün ([email protected]) > wrote: > > Hi Ron, > > Did you a) create a full-text index for your data and b) ensure that > your query is rewritten for index access? > > Best, > Christian > > > On Fri, Aug 3, 2018 at 2:39 PM Ron Katriel <[email protected]> wrote: > > > > Christian, > > > > Adding diacritics sensitive slows execution by a factor of 3. My script > (fragment below), which joins two large databases, namely CT.gov and > DrugBank, takes 2 hours without the diacritics sensitive constraint but 6 > hours with it. Given the combinatorics involved, I am wondering if there is > a better way to do this in BaseX. > > > > Thanks, > > Ron > > > > > > for $drug in db:open('DrugBank')/drugbank/drug > > let $drug_name := $drug/name/text() > > let $drug_synonyms := > functx:value-union(normalize-space(lower-case($drug/name)), > local:drug-synonyms($drug_name)) > > for $synonym_name in $drug_synonyms > > ... > > for $study in > db:open('CTGov')/clinical_study[intervention/intervention_name contains > text { $synonym_name } using case insensitive using diacritics sensitive] > > ... > > > > > > Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions > > 350 Hudson Street, 7th Floor, New York, NY 10014 > > [email protected] | direct: +1 201 337 3622 | mobile: +1 201 675 5598 > | main: +1 212 918 1800 > > > > On August 1, 2018 at 12:41:26 PM, Ron Katriel ([email protected]) > wrote: > > > > Thanks, Christian. Strange, prior to contacting you and on a hunch, I > tried adding the missing “using” keyword but still got the syntax error. > Anyway, everything is good now! > > > > Best, > > Ron > > > > On August 1, 2018 at 3:57:51 AM, Christian Grün ( > [email protected]) wrote: > > > > I have fixed the example in the doc. > > Best, Christian > > > > > > On Wed, Aug 1, 2018 at 5:08 AM Ron Katriel <[email protected]> wrote: > > > > > > Hi, > > > > > > The following from your website (docs.basex.org/wiki/Full-Text > <https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.basex.org_wiki_Full-2DText&d=DwMFaQ&c=fi2D4-9xMzmjyjREwHYlAw&r=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE&m=mk1COTV1sAZu82fBqU9P70ZPQXi-d6NrV1-5QYTPHOo&s=fzrCGjX9wfPKGZuwd7u4KJ4_AyzK0ZQtU9_PRyCam3U&e=>) > appears to be syntactically incorrect > > > > > > "'Äpfel' will not be found..." contains text "Apfel" diacritics > sensitive > > > > > > In the BaseX GUI the keyword diacritics is underlined in red and the > following error is reported > > > > > > Unexpected end of query: 'diacritic sens...'. > > > > > > This happens in version 8.6.4 and also the latest (9.0.2). > > > > > > Thanks, > > > Ron > > > > > > > > > Ron Katriel, Ph.D. | Principal Data Scientist | Medidata Solutions > > > > > > 350 Hudson Street, 7th Floor, New York, NY 10014 > > > > > > [email protected] | direct: +1 201 337 3622 | mobile: +1 201 675 > 5598 | main: +1 212 918 1800 > > > > > > > >

