Hi Will, Did you look at facet-options? You have fragment-frequency and item-frequency. Sounds it uses item-frequency (by default?) where you would prefer fragment-frequency..
http://docs.marklogic.com/search:search#facet-option Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:general- > [email protected]] Namens Will Thompson > Verzonden: woensdag 20 november 2013 22:22 > Aan: MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] xdmp:estimate overcounting > > Thanks for this example, Mike. xdmp:plan is much easier to understand in > ML7. > > Now that result counts are correct, it's more obvious that the Search API > facet counts are often off by a few, always overcounting compared to the > total returned after the search is executed with the related constraint. > > The problem seems to be that while cts:search is able to estimate result > counts within only the fragments defined in the searchable expression, > cts:element-values()/cts:frequency() does not. Therefore any ancestor > document <chapter> of our fragment root <doc> will be added in with the > facet estimate, while they are excluded from the search estimate. > > Is there a workaround, or is this just a pathological condition of using > fragment roots? > > > -Will > > > > On Nov 19, 2013, at 5:15 PM, Michael Blakeley <[email protected]> wrote: > > > That makes sense. For SEO purposes here's an example of how xdmp:plan > might help debug that sort of thing. The extra output in ML7 makes it clear > that with fast-phrase and without word-positions, only two-word terms are > checked. > > > > It is also possible to figure this out from the ML6 plans, but I think the new > annotations make it easier to understand. > > > > -- Mike > > > > xdmp:plan( > > cts:search(doc(), cts:word-query('dog cat rat'))) > > > > (: fast-phrase, no word-positions :) > > <qry:query-plan xmlns:qry="http://marklogic.com/cts/query"> > > <qry:info-trace>xdmp:eval("xdmp:plan(&#13;&#10; > cts:search(doc(), cts:word-query('dog cat ...", (), <options > xmlns="xdmp:eval"><database>14758162542116138691</databas > e><modules>17366211626271...</options>)</qry:info-trace> > > <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace> > > <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace> > > <qry:info-trace>Path is fully searchable.</qry:info-trace> > > <qry:info-trace>Gathering constraints.</qry:info-trace> > > <qry:word-trace text="dog cat"> > > <qry:key>2096356216808567173</qry:key> > > </qry:word-trace> > > <qry:word-trace text="cat rat"> > > <qry:key>12758927055138826609</qry:key> > > </qry:word-trace> > > <qry:info-trace>Search query contributed 2 constraints: cts:word- > query("dog cat rat", ("lang=en"), 1)</qry:info-trace> > > <qry:partial-plan> > > <qry:term-query weight="1"> > > <qry:key>2096356216808567173</qry:key> > > <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation> > > </qry:term-query> > > </qry:partial-plan> > > <qry:partial-plan> > > <qry:term-query weight="1"> > > <qry:key>12758927055138826609</qry:key> > > <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation> > > </qry:term-query> > > </qry:partial-plan> > > <qry:info-trace>Executing search.</qry:info-trace> > > <qry:final-plan> > > <qry:and-query> > > <qry:term-query weight="1"> > > <qry:key>2096356216808567173</qry:key> > > <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation> > > </qry:term-query> > > <qry:term-query weight="1"> > > <qry:key>12758927055138826609</qry:key> > > <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation> > > </qry:term-query> > > </qry:and-query> > > </qry:final-plan> > > <qry:info-trace>Selected 0 fragments to filter</qry:info-trace> > > <qry:result estimate="0"/> > > </qry:query-plan> > > > > (: word-positions :) > > <qry:query-plan xmlns:qry="http://marklogic.com/cts/query"> > > <qry:info-trace>xdmp:eval("xdmp:plan(&#13;&#10; > cts:search(doc(), cts:word-query('dog cat ...", (), <options > xmlns="xdmp:eval"><database>18400529833056734238</databas > e><root>/Users/mblakele/S...</options>)</qry:info-trace> > > <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace> > > <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace> > > <qry:info-trace>Path is fully searchable.</qry:info-trace> > > <qry:info-trace>Gathering constraints.</qry:info-trace> > > <qry:word-trace text="dog"> > > <qry:key>5166487143365525844</qry:key> > > </qry:word-trace> > > <qry:word-trace text="cat"> > > <qry:key>12545744176132597186</qry:key> > > </qry:word-trace> > > <qry:word-trace text="rat"> > > <qry:key>12285550591485045727</qry:key> > > </qry:word-trace> > > <qry:info-trace>Search query contributed 1 constraint: cts:word- > query("dog cat rat", ("lang=en"), 1)</qry:info-trace> > > <qry:partial-plan> > > <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295"> > > <qry:KP pos="0"> > > <qry:key>5166487143365525844</qry:key> > > <qry:annotation>word("dog")</qry:annotation> > > </qry:KP> > > <qry:KP pos="1"> > > <qry:key>12545744176132597186</qry:key> > > <qry:annotation>word("cat")</qry:annotation> > > </qry:KP> > > <qry:KP pos="2"> > > <qry:key>12285550591485045727</qry:key> > > <qry:annotation>word("rat")</qry:annotation> > > </qry:KP> > > </qry:word-query> > > </qry:partial-plan> > > <qry:info-trace>Executing search.</qry:info-trace> > > <qry:final-plan> > > <qry:and-query> > > <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295"> > > <qry:KP pos="0"> > > <qry:key>5166487143365525844</qry:key> > > <qry:annotation>word("dog")</qry:annotation> > > </qry:KP> > > <qry:KP pos="1"> > > <qry:key>12545744176132597186</qry:key> > > <qry:annotation>word("cat")</qry:annotation> > > </qry:KP> > > <qry:KP pos="2"> > > <qry:key>12285550591485045727</qry:key> > > <qry:annotation>word("rat")</qry:annotation> > > </qry:KP> > > </qry:word-query> > > </qry:and-query> > > </qry:final-plan> > > <qry:info-trace>Selected 0 fragments to filter</qry:info-trace> > > <qry:result estimate="0"/> > > </qry:query-plan> > > > > On 19 Nov 2013, at 15:05 , Will Thompson > <[email protected]> wrote: > > > >> I narrowed down the problem to 3+ word phrases. With that hunch, I > enabled word positions, and after reindexing the estimates are now correct. > >> > >> I was thinking, incorrectly, that estimates would still be accurate with only > fast phrase searches (and not word positions) enabled. But now that I look > back at how that works, it's clear that would only be true of 2-word phrases. > >> > >> -Will > >> > >> > >> On Nov 19, 2013, at 3:23 PM, Michael Blakeley <[email protected]> > wrote: > >> > >>> Which release is this? Is the problem limited to a particular word? If so, > what words? > >>> > >>> Have you tried a query trace or xdmp:plan yet? If you can run that with > ML7 that is even more useful. > >>> > >>> -- Mike > >>> > >>> On 19 Nov 2013, at 12:43 , Will Thompson > <[email protected]> wrote: > >>> > >>>> I'm trying to determine why some search result estimates are > overcounted. Documents generally look like: > >>>> > >>>> <chapter> > >>>> <subchapter> > >>>> <doc> > >>>> <section> > >>>> > >>>> Fragment root is set on <doc> (and no ancestors or descendants of > <doc>). count(//doc) = xdmp:estimate(//doc) => true. The searchable > expression is xdmp:directory(('dir1', 'dir2', .), 'infinity')//doc. The word > query specification explicitly includes <doc> and excludes document root. > >>>> > >>>> The documentation suggests to prevent overcounting we just ensure > that 1) searchable expressions always select a fragment, and 2) there are no > predicates applied to the searchable expression. Are there any other > conditions that may cause overcounting of a simple word query? > >>>> > >>>> -Will > >>>> _______________________________________________ > >>>> General mailing list > >>>> [email protected] > >>>> http://developer.marklogic.com/mailman/listinfo/general > >>>> > >>> > >>> _______________________________________________ > >>> General mailing list > >>> [email protected] > >>> http://developer.marklogic.com/mailman/listinfo/general > >>> > >> > >> _______________________________________________ > >> General mailing list > >> [email protected] > >> http://developer.marklogic.com/mailman/listinfo/general > >> > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
