Hi Will,

Did you look at facet-options? You have fragment-frequency and
item-frequency. Sounds it uses item-frequency (by default?) where you
would prefer fragment-frequency..

http://docs.marklogic.com/search:search#facet-option

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: [email protected] [mailto:general-
> [email protected]] Namens Will Thompson
> Verzonden: woensdag 20 november 2013 22:22
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] xdmp:estimate overcounting
>
> Thanks for this example, Mike. xdmp:plan is much easier to understand in
> ML7.
>
> Now that result counts are correct, it's more obvious that the Search
API
> facet counts are often off by a few, always overcounting compared to the
> total returned after the search is executed with the related constraint.
>
> The problem seems to be that while cts:search is able to estimate result
> counts within only the fragments defined in the searchable expression,
> cts:element-values()/cts:frequency() does not. Therefore any ancestor
> document <chapter> of our fragment root <doc> will be added in with the
> facet estimate, while they are excluded from the search estimate.
>
> Is there a workaround, or is this just a pathological condition of using
> fragment roots?
>
>
> -Will
>
>
>
> On Nov 19, 2013, at 5:15 PM, Michael Blakeley <[email protected]> wrote:
>
> > That makes sense. For SEO purposes here's an example of how xdmp:plan
> might help debug that sort of thing. The extra output in ML7 makes it
clear
> that with fast-phrase and without word-positions, only two-word terms
are
> checked.
> >
> > It is also possible to figure this out from the ML6 plans, but I think
the new
> annotations make it easier to understand.
> >
> > -- Mike
> >
> > xdmp:plan(
> >  cts:search(doc(), cts:word-query('dog cat rat')))
> >
> > (: fast-phrase, no word-positions :)
> > <qry:query-plan xmlns:qry="http://marklogic.com/cts/query";>
> >  <qry:info-trace>xdmp:eval("xdmp:plan(&amp;#13;&amp;#10;
> cts:search(doc(), cts:word-query('dog cat ...", (), &lt;options
> xmlns="xdmp:eval"&gt;&lt;database&gt;14758162542116138691&lt;/databas
> e&gt;&lt;modules&gt;17366211626271...&lt;/options&gt;)</qry:info-trace>
> >  <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace>
> >  <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace>
> >  <qry:info-trace>Path is fully searchable.</qry:info-trace>
> >  <qry:info-trace>Gathering constraints.</qry:info-trace>
> >  <qry:word-trace text="dog cat">
> >    <qry:key>2096356216808567173</qry:key>
> >  </qry:word-trace>
> >  <qry:word-trace text="cat rat">
> >    <qry:key>12758927055138826609</qry:key>
> >  </qry:word-trace>
> >  <qry:info-trace>Search query contributed 2 constraints: cts:word-
> query("dog cat rat", ("lang=en"), 1)</qry:info-trace>
> >  <qry:partial-plan>
> >    <qry:term-query weight="1">
> >      <qry:key>2096356216808567173</qry:key>
> >      <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation>
> >    </qry:term-query>
> >  </qry:partial-plan>
> >  <qry:partial-plan>
> >    <qry:term-query weight="1">
> >      <qry:key>12758927055138826609</qry:key>
> >      <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation>
> >    </qry:term-query>
> >  </qry:partial-plan>
> >  <qry:info-trace>Executing search.</qry:info-trace>
> >  <qry:final-plan>
> >    <qry:and-query>
> >      <qry:term-query weight="1">
> >     <qry:key>2096356216808567173</qry:key>
> >     <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation>
> >      </qry:term-query>
> >      <qry:term-query weight="1">
> >     <qry:key>12758927055138826609</qry:key>
> >     <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation>
> >      </qry:term-query>
> >    </qry:and-query>
> >  </qry:final-plan>
> >  <qry:info-trace>Selected 0 fragments to filter</qry:info-trace>
> >  <qry:result estimate="0"/>
> > </qry:query-plan>
> >
> > (: word-positions :)
> > <qry:query-plan xmlns:qry="http://marklogic.com/cts/query";>
> >  <qry:info-trace>xdmp:eval("xdmp:plan(&amp;#13;&amp;#10;
> cts:search(doc(), cts:word-query('dog cat ...", (), &lt;options
> xmlns="xdmp:eval"&gt;&lt;database&gt;18400529833056734238&lt;/databas
> e&gt;&lt;root&gt;/Users/mblakele/S...&lt;/options&gt;)</qry:info-trace>
> >  <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace>
> >  <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace>
> >  <qry:info-trace>Path is fully searchable.</qry:info-trace>
> >  <qry:info-trace>Gathering constraints.</qry:info-trace>
> >  <qry:word-trace text="dog">
> >    <qry:key>5166487143365525844</qry:key>
> >  </qry:word-trace>
> >  <qry:word-trace text="cat">
> >    <qry:key>12545744176132597186</qry:key>
> >  </qry:word-trace>
> >  <qry:word-trace text="rat">
> >    <qry:key>12285550591485045727</qry:key>
> >  </qry:word-trace>
> >  <qry:info-trace>Search query contributed 1 constraint: cts:word-
> query("dog cat rat", ("lang=en"), 1)</qry:info-trace>
> >  <qry:partial-plan>
> >    <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295">
> >      <qry:KP pos="0">
> >     <qry:key>5166487143365525844</qry:key>
> >     <qry:annotation>word("dog")</qry:annotation>
> >      </qry:KP>
> >      <qry:KP pos="1">
> >     <qry:key>12545744176132597186</qry:key>
> >     <qry:annotation>word("cat")</qry:annotation>
> >      </qry:KP>
> >      <qry:KP pos="2">
> >     <qry:key>12285550591485045727</qry:key>
> >     <qry:annotation>word("rat")</qry:annotation>
> >      </qry:KP>
> >    </qry:word-query>
> >  </qry:partial-plan>
> >  <qry:info-trace>Executing search.</qry:info-trace>
> >  <qry:final-plan>
> >    <qry:and-query>
> >      <qry:word-query weight="1" min-occurs="1"
max-occurs="4294967295">
> >     <qry:KP pos="0">
> >       <qry:key>5166487143365525844</qry:key>
> >       <qry:annotation>word("dog")</qry:annotation>
> >     </qry:KP>
> >     <qry:KP pos="1">
> >       <qry:key>12545744176132597186</qry:key>
> >       <qry:annotation>word("cat")</qry:annotation>
> >     </qry:KP>
> >     <qry:KP pos="2">
> >       <qry:key>12285550591485045727</qry:key>
> >       <qry:annotation>word("rat")</qry:annotation>
> >     </qry:KP>
> >      </qry:word-query>
> >    </qry:and-query>
> >  </qry:final-plan>
> >  <qry:info-trace>Selected 0 fragments to filter</qry:info-trace>
> >  <qry:result estimate="0"/>
> > </qry:query-plan>
> >
> > On 19 Nov 2013, at 15:05 , Will Thompson
> <[email protected]> wrote:
> >
> >> I narrowed down the problem to 3+ word phrases. With that hunch, I
> enabled word positions, and after reindexing the estimates are now
correct.
> >>
> >> I was thinking, incorrectly, that estimates would still be accurate
with only
> fast phrase searches (and not word positions) enabled. But now that I
look
> back at how that works, it's clear that would only be true of 2-word
phrases.
> >>
> >> -Will
> >>
> >>
> >> On Nov 19, 2013, at 3:23 PM, Michael Blakeley <[email protected]>
> wrote:
> >>
> >>> Which release is this? Is the problem limited to a particular word?
If so,
> what words?
> >>>
> >>> Have you tried a query trace or xdmp:plan yet? If you can run that
with
> ML7 that is even more useful.
> >>>
> >>> -- Mike
> >>>
> >>> On 19 Nov 2013, at 12:43 , Will Thompson
> <[email protected]> wrote:
> >>>
> >>>> I'm trying to determine why some search result estimates are
> overcounted. Documents generally look like:
> >>>>
> >>>> <chapter>
> >>>> <subchapter>
> >>>>     <doc>
> >>>>         <section>
> >>>>
> >>>> Fragment root is set on <doc> (and no ancestors or descendants of
> <doc>). count(//doc) = xdmp:estimate(//doc) => true. The searchable
> expression is xdmp:directory(('dir1', 'dir2', .), 'infinity')//doc. The
word
> query specification explicitly includes <doc> and excludes document
root.
> >>>>
> >>>> The documentation suggests to prevent overcounting we just ensure
> that 1) searchable expressions always select a fragment, and 2) there
are no
> predicates applied to the searchable expression. Are there any other
> conditions that may cause overcounting of a simple word query?
> >>>>
> >>>> -Will
> >>>> _______________________________________________
> >>>> General mailing list
> >>>> [email protected]
> >>>> http://developer.marklogic.com/mailman/listinfo/general
> >>>>
> >>>
> >>> _______________________________________________
> >>> General mailing list
> >>> [email protected]
> >>> http://developer.marklogic.com/mailman/listinfo/general
> >>>
> >>
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://developer.marklogic.com/mailman/listinfo/general
> >>
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
> >
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to