Hi, This may be a question about XQuery Full Text, or only about common usage (or misusage?) of XPath; in either case I hope it's on topic. Please tell me if not.
In BaseX [A]: let $test := <test> <p>The apple <em>never</em> falls far from the tree.</p> <p><!-- comment -->Apples and trees.</p> <p>Trees and <!-- comment --> apples.</p> <p><fruit>Apple</fruit> trees.</p> </test> return $test/*[text() contains text ('apple' ftand 'tree') using stemming using language 'en'] This returns <p> <!-- comment --> Apples and trees.</p> As an experienced XPath user, this is what I expect, assuming "contains text" allows a sequence of nodes as its first argument (and returns true if any of them satisfies the test). Only the second 'p' element has a child text node whose value contains both "apple" and "tree". Of course the problem in the others is the mixed content: in the first, an element node 'em' intervenes, while in the third, a comment intervenes, so both these cases contain text nodes with either "apple" or "tree", but not both. In the case of the fourth 'p', there is no text node child containing "apple" at all, only a grandchild. Assuming I want all four back, I can write either: [B] return $test/*[string() contains text ('apple' ftand 'tree') using stemming using language 'en'] or [C] return $test/*[. contains text ('apple' ftand 'tree') using stemming using language 'en'] In the case of [B], the string() function casts the element to a string, flattening its structure. [C] passes the element itself to the "contains text" operation, which happily has the same effect. I have several related questions about this: 1. Unless I learn better, I'm going to prefer [B] or [C], because in my world, mixed content is common; is there any reason (performance or otherwise) to prefer [A] in cases where I know it will be robust? Is there any reason to prefer [B] or prefer [C]? 2. I see examples like [A] offered frequently in the XQuery literature, of "text()" being used apparently to refer to an element's string (text) value not to its text node children. And I see this usage in running code. I can only imagine that those who write it are simply not aware that mixed content will complicate their queries like this; maybe they have just never thought about it, or they don't know what text() actually does. In any case, the error is pernicious, since nothing tells you the query you gave isn't the one you intended -- it even works, until the day it doesn't, and the cases where gives correct but unwanted results may be rare. But maybe I'm wrong and they just know something about XQuery, XQuery FT, or their tools, that I don't. What do the experts say? Cheers, Wendell -- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^ _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk