Thank you very much Mary for the explanations. Will keep your suggestion in mind.
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mary Holstege Sent: Wednesday, August 26, 2015 9:15 AM To: MarkLogic Developer Discussion Cc: Yang, Yun Subject: Re: [MarkLogic Dev General] Word Query - Excluded element Question On Wed, 26 Aug 2015 00:49:17 -0700, David Ennis <[email protected]> wrote: > I was hoping someone would have a better answer before I replied, but > here is my response. Hopefully others will clarify / build on it. > > > I do not think this will make a difference. The reason being that I > understand that the way excluded and included elements actually works > is related to traversing the tree while creating the word indexes and > including or excluding parts of the tree in the indexing step. Or > even if this statement inaccurate in some way, it is still related to > the analysis of the trees (even the tree structure in ML is a type of > internal index like a term list, but element A pointing to parent B > rather than a term to a fragment. > > So, with the includes and excludes all related to the word queries and > the way the tree was indexed, I don't see how any range indexes will > help this. > > Perhaps someone will debunk this understanding and/or suggest some > magic combination of other approaches. Perhaps there is another way of > creating a field with an xpath expression of the ones to include and > using a tuned field-word-query on that or similiar. > You are correct: the excludes are processed at index time while we are walking the tree. At query time, we are just looking up word keys, and if the element was excluded there will be no word keys for words in that element in the index, so there will be no match. Adding a range index on that attribute will only create more work at indexing time and will do nothing at query time. There is no intrinsic issue with there being a lot of documents with the exclusions -- the overhead of applying them at indexing is small, and in fact if there are large chunks of documents being excluded, could be a net performance enhancement (as well as saving space in the index) and at query time there is zero overhead -- again, a net savings because there are fewer matches to consider. The danger with excludes on the word query field is that it disables certain optimizations that rely on word positions so you get false positives on element queries of various kinds. If you aren't using positions anyway it won't matter. If you want to make sure we can still do those optimizations (in the most recent releases of 7 and 8) you need to also define the excluded elements as phrase-arounds. However, you can't put an attribute/value condition on a phrase-around, so you can't do that for your case. One possibility is to use a named field with the exclusions and do everything as field-word queries instead of word queries and remove the exclusions from the word field. That will cost you time and space at indexing time, however. //Mary _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
