Thank you very much Mary for the explanations. Will keep your suggestion in 
mind.

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Mary Holstege
Sent: Wednesday, August 26, 2015 9:15 AM
To: MarkLogic Developer Discussion
Cc: Yang, Yun
Subject: Re: [MarkLogic Dev General] Word Query - Excluded element Question

On Wed, 26 Aug 2015 00:49:17 -0700, David Ennis <[email protected]>
wrote:

> I was hoping someone would have a better answer before I replied, but 
> here is my response.  Hopefully others will clarify  / build on it.
>
>
> I do not think this will make a difference.  The reason being that I 
> understand that the way excluded and included elements actually works 
> is related to traversing the tree while creating the word indexes and 
> including or excluding parts of the tree in the indexing step.  Or 
> even if this statement inaccurate in some way, it is still related to 
> the analysis of the trees (even the tree structure in ML is a type of 
> internal index like a term list, but element A pointing to parent B 
> rather than a term to a fragment.
>
> So, with the includes and excludes all related to the word queries and 
> the way the tree was indexed, I don't see how any range indexes will 
> help this.
>
> Perhaps someone will debunk this understanding and/or suggest some 
> magic combination of other approaches. Perhaps there is another way of 
> creating a field with an xpath expression of the ones to include and 
> using a tuned field-word-query on that or similiar.
>

You are correct: the excludes are processed at index time while we are  
walking the tree. At query time, we are just looking up word keys, and if  
the element was excluded there will be no word keys for words in that  
element in the index, so there will be no match. Adding a range index on  
that attribute will only create more work at indexing time and will do  
nothing at query time.  There is no intrinsic issue with there being a lot  
of documents with the exclusions -- the overhead of applying them at  
indexing is small, and in fact if there are large chunks of documents  
being excluded, could be a net performance enhancement (as well as saving  
space in the index) and at query time there is zero overhead -- again, a  
net savings because there are fewer matches to consider.

The danger with excludes on the word query field is that it disables  
certain optimizations that rely on word positions so you get false  
positives on element queries of various kinds. If you aren't using  
positions anyway it won't matter. If you want to make sure we can still do  
those optimizations (in the most recent releases of 7 and 8) you need to  
also define the excluded elements as phrase-arounds. However, you can't  
put an attribute/value condition on a phrase-around, so you can't do that  
for your case.
One possibility is to use a named field with the exclusions and do  
everything as field-word queries instead of word queries and remove the  
exclusions from the word field. That will cost you time and space at  
indexing time, however.

//Mary

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to