Mike, Scratch that, I think I got it working. Thanks.
-Will On Nov 20, 2013, at 3:43 PM, Will Thompson <[email protected]> wrote: > Geert, > > I set <facet-option>fragment-frequency</facet-option>, just in case, but as > far as I can tell it is the default (6.0-4). > > Mike, > > I tried both and-ing the element-query and putting the whole query as a child > of element-query, but the results are the same. It seems like what’s > happening is that the element constraint just enforces that the result match > within a <doc>, which I am guessing is still true when matching a descendant > <doc> of <chapter>. > > -Will > > > On Nov 20, 2013, at 3:30 PM, Michael Blakeley <[email protected]> wrote: > >> Ideally you'd pass the same searchable expression to the lexicon function >> and it would figure out how to resolve it. And that might be the key to a >> workaround. >> >> As I understand it the unfiltered part of cts:search combines terms from the >> searchable expression with terms from the supplied query. So you could try >> to do that yourself: for example //doc is roughly equivalent to >> cts:element-query(xs:QName('doc'), cts:and-query(()). Call >> cts:element-values with cts:and-query of that new query and your user query. >> >> I'm not sure if that will be 100% effective in every situation, but it's >> worth a try. >> >> -- Mike >> >> On 20 Nov 2013, at 13:22 , Will Thompson <[email protected]> wrote: >> >>> Thanks for this example, Mike. xdmp:plan is much easier to understand in >>> ML7. >>> >>> Now that result counts are correct, it’s more obvious that the Search API >>> facet counts are often off by a few, always overcounting compared to the >>> total returned after the search is executed with the related constraint. >>> >>> The problem seems to be that while cts:search is able to estimate result >>> counts within only the fragments defined in the searchable expression, >>> cts:element-values()/cts:frequency() does not. Therefore any ancestor >>> document <chapter> of our fragment root <doc> will be added in with the >>> facet estimate, while they are excluded from the search estimate. >>> >>> Is there a workaround, or is this just a pathological condition of using >>> fragment roots? >>> >>> >>> -Will >>> >>> >>> >>> On Nov 19, 2013, at 5:15 PM, Michael Blakeley <[email protected]> wrote: >>> >>>> That makes sense. For SEO purposes here's an example of how xdmp:plan >>>> might help debug that sort of thing. The extra output in ML7 makes it >>>> clear that with fast-phrase and without word-positions, only two-word >>>> terms are checked. >>>> >>>> It is also possible to figure this out from the ML6 plans, but I think the >>>> new annotations make it easier to understand. >>>> >>>> -- Mike >>>> >>>> xdmp:plan( >>>> cts:search(doc(), cts:word-query('dog cat rat'))) >>>> >>>> (: fast-phrase, no word-positions :) >>>> <qry:query-plan xmlns:qry="http://marklogic.com/cts/query"> >>>> <qry:info-trace>xdmp:eval("xdmp:plan(&#13;&#10; cts:search(doc(), >>>> cts:word-query('dog cat ...", (), <options >>>> xmlns="xdmp:eval"><database>14758162542116138691</database><modules>17366211626271...</options>)</qry:info-trace> >>>> <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace> >>>> <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace> >>>> <qry:info-trace>Path is fully searchable.</qry:info-trace> >>>> <qry:info-trace>Gathering constraints.</qry:info-trace> >>>> <qry:word-trace text="dog cat"> >>>> <qry:key>2096356216808567173</qry:key> >>>> </qry:word-trace> >>>> <qry:word-trace text="cat rat"> >>>> <qry:key>12758927055138826609</qry:key> >>>> </qry:word-trace> >>>> <qry:info-trace>Search query contributed 2 constraints: >>>> cts:word-query("dog cat rat", ("lang=en"), 1)</qry:info-trace> >>>> <qry:partial-plan> >>>> <qry:term-query weight="1"> >>>> <qry:key>2096356216808567173</qry:key> >>>> <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation> >>>> </qry:term-query> >>>> </qry:partial-plan> >>>> <qry:partial-plan> >>>> <qry:term-query weight="1"> >>>> <qry:key>12758927055138826609</qry:key> >>>> <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation> >>>> </qry:term-query> >>>> </qry:partial-plan> >>>> <qry:info-trace>Executing search.</qry:info-trace> >>>> <qry:final-plan> >>>> <qry:and-query> >>>> <qry:term-query weight="1"> >>>> <qry:key>2096356216808567173</qry:key> >>>> <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation> >>>> </qry:term-query> >>>> <qry:term-query weight="1"> >>>> <qry:key>12758927055138826609</qry:key> >>>> <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation> >>>> </qry:term-query> >>>> </qry:and-query> >>>> </qry:final-plan> >>>> <qry:info-trace>Selected 0 fragments to filter</qry:info-trace> >>>> <qry:result estimate="0"/> >>>> </qry:query-plan> >>>> >>>> (: word-positions :) >>>> <qry:query-plan xmlns:qry="http://marklogic.com/cts/query"> >>>> <qry:info-trace>xdmp:eval("xdmp:plan(&#13;&#10; cts:search(doc(), >>>> cts:word-query('dog cat ...", (), <options >>>> xmlns="xdmp:eval"><database>18400529833056734238</database><root>/Users/mblakele/S...</options>)</qry:info-trace> >>>> <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace> >>>> <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace> >>>> <qry:info-trace>Path is fully searchable.</qry:info-trace> >>>> <qry:info-trace>Gathering constraints.</qry:info-trace> >>>> <qry:word-trace text="dog"> >>>> <qry:key>5166487143365525844</qry:key> >>>> </qry:word-trace> >>>> <qry:word-trace text="cat"> >>>> <qry:key>12545744176132597186</qry:key> >>>> </qry:word-trace> >>>> <qry:word-trace text="rat"> >>>> <qry:key>12285550591485045727</qry:key> >>>> </qry:word-trace> >>>> <qry:info-trace>Search query contributed 1 constraint: cts:word-query("dog >>>> cat rat", ("lang=en"), 1)</qry:info-trace> >>>> <qry:partial-plan> >>>> <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295"> >>>> <qry:KP pos="0"> >>>> <qry:key>5166487143365525844</qry:key> >>>> <qry:annotation>word("dog")</qry:annotation> >>>> </qry:KP> >>>> <qry:KP pos="1"> >>>> <qry:key>12545744176132597186</qry:key> >>>> <qry:annotation>word("cat")</qry:annotation> >>>> </qry:KP> >>>> <qry:KP pos="2"> >>>> <qry:key>12285550591485045727</qry:key> >>>> <qry:annotation>word("rat")</qry:annotation> >>>> </qry:KP> >>>> </qry:word-query> >>>> </qry:partial-plan> >>>> <qry:info-trace>Executing search.</qry:info-trace> >>>> <qry:final-plan> >>>> <qry:and-query> >>>> <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295"> >>>> <qry:KP pos="0"> >>>> <qry:key>5166487143365525844</qry:key> >>>> <qry:annotation>word("dog")</qry:annotation> >>>> </qry:KP> >>>> <qry:KP pos="1"> >>>> <qry:key>12545744176132597186</qry:key> >>>> <qry:annotation>word("cat")</qry:annotation> >>>> </qry:KP> >>>> <qry:KP pos="2"> >>>> <qry:key>12285550591485045727</qry:key> >>>> <qry:annotation>word("rat")</qry:annotation> >>>> </qry:KP> >>>> </qry:word-query> >>>> </qry:and-query> >>>> </qry:final-plan> >>>> <qry:info-trace>Selected 0 fragments to filter</qry:info-trace> >>>> <qry:result estimate="0"/> >>>> </qry:query-plan> >>>> >>>> On 19 Nov 2013, at 15:05 , Will Thompson <[email protected]> >>>> wrote: >>>> >>>>> I narrowed down the problem to 3+ word phrases. With that hunch, I >>>>> enabled word positions, and after reindexing the estimates are now >>>>> correct. >>>>> >>>>> I was thinking, incorrectly, that estimates would still be accurate with >>>>> only fast phrase searches (and not word positions) enabled. But now that >>>>> I look back at how that works, it’s clear that would only be true of >>>>> 2-word phrases. >>>>> >>>>> -Will >>>>> >>>>> >>>>> On Nov 19, 2013, at 3:23 PM, Michael Blakeley <[email protected]> wrote: >>>>> >>>>>> Which release is this? Is the problem limited to a particular word? If >>>>>> so, what words? >>>>>> >>>>>> Have you tried a query trace or xdmp:plan yet? If you can run that with >>>>>> ML7 that is even more useful. >>>>>> >>>>>> -- Mike >>>>>> >>>>>> On 19 Nov 2013, at 12:43 , Will Thompson <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I’m trying to determine why some search result estimates are >>>>>>> overcounted. Documents generally look like: >>>>>>> >>>>>>> <chapter> >>>>>>> <subchapter> >>>>>>> <doc> >>>>>>> <section> >>>>>>> >>>>>>> Fragment root is set on <doc> (and no ancestors or descendants of >>>>>>> <doc>). count(//doc) = xdmp:estimate(//doc) => true. The searchable >>>>>>> expression is xdmp:directory((‘dir1’, ‘dir2’, …), ‘infinity’)//doc. The >>>>>>> word query specification explicitly includes <doc> and excludes >>>>>>> document root. >>>>>>> >>>>>>> The documentation suggests to prevent overcounting we just ensure that >>>>>>> 1) searchable expressions always select a fragment, and 2) there are no >>>>>>> predicates applied to the searchable expression. Are there any other >>>>>>> conditions that may cause overcounting of a simple word query? >>>>>>> >>>>>>> -Will >>>>>>> _______________________________________________ >>>>>>> General mailing list >>>>>>> [email protected] >>>>>>> http://developer.marklogic.com/mailman/listinfo/general >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> General mailing list >>>>>> [email protected] >>>>>> http://developer.marklogic.com/mailman/listinfo/general >>>>>> >>>>> >>>>> _______________________________________________ >>>>> General mailing list >>>>> [email protected] >>>>> http://developer.marklogic.com/mailman/listinfo/general >>>>> >>>> >>>> _______________________________________________ >>>> General mailing list >>>> [email protected] >>>> http://developer.marklogic.com/mailman/listinfo/general >>>> >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >>> >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
