Hi, I performed join operations between many files and a dictionary. The files contain tokenized texts, where one finds word forms + fine-grained POS tags. Look at the following file:
https://raw.githubusercontent.com/gcelano/POStaggedAncientGreekXML/master/texts/tlg0001.tlg001.perseus-grc2.xml <https://raw.githubusercontent.com/gcelano/POStaggedAncientGreekXML/master/texts/tlg0001.tlg001.perseus-grc2.xml> The dictionary, which contains word forms + fine-grained POS tags + lemmas, can be found here: https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/uniqueTokens/values <https://github.com/gcelano/LemmatizedAncientGreekXML/tree/master/uniqueTokens/values> I created a database for the dictionary and wrote a query (here simplified) like the following: for $t in $s/t (: t are the tokens in the file containing the tokens :) let $match := $lemm//d[./p = $t/@o and ./f = $t/text()] (: $lemm//d are the single entries in the dictionary :) return $match I see that if I use this query, it is slow, as if the processor cannot use the database indexes (./p and ./f). The situation does not seem to improve with ./p/text() and ./f/text(), which I would assume to be equivalent to the former because of atomization. On the contrary, if the same information contained in ./p and ./f are merged together and put in an attribute (see @v in the dictionary files) and this is compared against the values in the text (after concatenating them properly), the join operation is super fast (i.e., the index for the values in the attributes are used by BaseX). Does anyone know why? I have been able to get my results via the above (slow) comparison, but I would like to know what the cause of the problem was, if possible. Thanks. Best, Giuseppe Universität Leipzig Institute of Computer Science, Digital Humanities Augustusplatz 10 04109 Leipzig Deutschland E-mail: cel...@informatik.uni-leipzig.de E-mail: giuseppegacel...@gmail.com Web site 1: http://www.dh.uni-leipzig.de/wo/team/ Web site 2: https://sites.google.com/site/giuseppegacelano/