Ideally you'd pass the same searchable expression to the lexicon function and
it would figure out how to resolve it. And that might be the key to a
workaround.
As I understand it the unfiltered part of cts:search combines terms from the
searchable expression with terms from the supplied query. So you could try to
do that yourself: for example //doc is roughly equivalent to
cts:element-query(xs:QName('doc'), cts:and-query(()). Call cts:element-values
with cts:and-query of that new query and your user query.
I'm not sure if that will be 100% effective in every situation, but it's worth
a try.
-- Mike
On 20 Nov 2013, at 13:22 , Will Thompson <[email protected]> wrote:
> Thanks for this example, Mike. xdmp:plan is much easier to understand in ML7.
>
> Now that result counts are correct, it’s more obvious that the Search API
> facet counts are often off by a few, always overcounting compared to the
> total returned after the search is executed with the related constraint.
>
> The problem seems to be that while cts:search is able to estimate result
> counts within only the fragments defined in the searchable expression,
> cts:element-values()/cts:frequency() does not. Therefore any ancestor
> document <chapter> of our fragment root <doc> will be added in with the facet
> estimate, while they are excluded from the search estimate.
>
> Is there a workaround, or is this just a pathological condition of using
> fragment roots?
>
>
> -Will
>
>
>
> On Nov 19, 2013, at 5:15 PM, Michael Blakeley <[email protected]> wrote:
>
>> That makes sense. For SEO purposes here's an example of how xdmp:plan might
>> help debug that sort of thing. The extra output in ML7 makes it clear that
>> with fast-phrase and without word-positions, only two-word terms are checked.
>>
>> It is also possible to figure this out from the ML6 plans, but I think the
>> new annotations make it easier to understand.
>>
>> -- Mike
>>
>> xdmp:plan(
>> cts:search(doc(), cts:word-query('dog cat rat')))
>>
>> (: fast-phrase, no word-positions :)
>> <qry:query-plan xmlns:qry="http://marklogic.com/cts/query">
>> <qry:info-trace>xdmp:eval("xdmp:plan(&#13;&#10; cts:search(doc(),
>> cts:word-query('dog cat ...", (), <options
>> xmlns="xdmp:eval"><database>14758162542116138691</database><modules>17366211626271...</options>)</qry:info-trace>
>> <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace>
>> <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace>
>> <qry:info-trace>Path is fully searchable.</qry:info-trace>
>> <qry:info-trace>Gathering constraints.</qry:info-trace>
>> <qry:word-trace text="dog cat">
>> <qry:key>2096356216808567173</qry:key>
>> </qry:word-trace>
>> <qry:word-trace text="cat rat">
>> <qry:key>12758927055138826609</qry:key>
>> </qry:word-trace>
>> <qry:info-trace>Search query contributed 2 constraints: cts:word-query("dog
>> cat rat", ("lang=en"), 1)</qry:info-trace>
>> <qry:partial-plan>
>> <qry:term-query weight="1">
>> <qry:key>2096356216808567173</qry:key>
>> <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation>
>> </qry:term-query>
>> </qry:partial-plan>
>> <qry:partial-plan>
>> <qry:term-query weight="1">
>> <qry:key>12758927055138826609</qry:key>
>> <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation>
>> </qry:term-query>
>> </qry:partial-plan>
>> <qry:info-trace>Executing search.</qry:info-trace>
>> <qry:final-plan>
>> <qry:and-query>
>> <qry:term-query weight="1">
>> <qry:key>2096356216808567173</qry:key>
>> <qry:annotation>pair(word("dog"),word("cat"))</qry:annotation>
>> </qry:term-query>
>> <qry:term-query weight="1">
>> <qry:key>12758927055138826609</qry:key>
>> <qry:annotation>pair(word("cat"),word("rat"))</qry:annotation>
>> </qry:term-query>
>> </qry:and-query>
>> </qry:final-plan>
>> <qry:info-trace>Selected 0 fragments to filter</qry:info-trace>
>> <qry:result estimate="0"/>
>> </qry:query-plan>
>>
>> (: word-positions :)
>> <qry:query-plan xmlns:qry="http://marklogic.com/cts/query">
>> <qry:info-trace>xdmp:eval("xdmp:plan(&#13;&#10; cts:search(doc(),
>> cts:word-query('dog cat ...", (), <options
>> xmlns="xdmp:eval"><database>18400529833056734238</database><root>/Users/mblakele/S...</options>)</qry:info-trace>
>> <qry:info-trace>Analyzing path for search: fn:doc()</qry:info-trace>
>> <qry:info-trace>Step 1 is searchable: fn:doc()</qry:info-trace>
>> <qry:info-trace>Path is fully searchable.</qry:info-trace>
>> <qry:info-trace>Gathering constraints.</qry:info-trace>
>> <qry:word-trace text="dog">
>> <qry:key>5166487143365525844</qry:key>
>> </qry:word-trace>
>> <qry:word-trace text="cat">
>> <qry:key>12545744176132597186</qry:key>
>> </qry:word-trace>
>> <qry:word-trace text="rat">
>> <qry:key>12285550591485045727</qry:key>
>> </qry:word-trace>
>> <qry:info-trace>Search query contributed 1 constraint: cts:word-query("dog
>> cat rat", ("lang=en"), 1)</qry:info-trace>
>> <qry:partial-plan>
>> <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295">
>> <qry:KP pos="0">
>> <qry:key>5166487143365525844</qry:key>
>> <qry:annotation>word("dog")</qry:annotation>
>> </qry:KP>
>> <qry:KP pos="1">
>> <qry:key>12545744176132597186</qry:key>
>> <qry:annotation>word("cat")</qry:annotation>
>> </qry:KP>
>> <qry:KP pos="2">
>> <qry:key>12285550591485045727</qry:key>
>> <qry:annotation>word("rat")</qry:annotation>
>> </qry:KP>
>> </qry:word-query>
>> </qry:partial-plan>
>> <qry:info-trace>Executing search.</qry:info-trace>
>> <qry:final-plan>
>> <qry:and-query>
>> <qry:word-query weight="1" min-occurs="1" max-occurs="4294967295">
>> <qry:KP pos="0">
>> <qry:key>5166487143365525844</qry:key>
>> <qry:annotation>word("dog")</qry:annotation>
>> </qry:KP>
>> <qry:KP pos="1">
>> <qry:key>12545744176132597186</qry:key>
>> <qry:annotation>word("cat")</qry:annotation>
>> </qry:KP>
>> <qry:KP pos="2">
>> <qry:key>12285550591485045727</qry:key>
>> <qry:annotation>word("rat")</qry:annotation>
>> </qry:KP>
>> </qry:word-query>
>> </qry:and-query>
>> </qry:final-plan>
>> <qry:info-trace>Selected 0 fragments to filter</qry:info-trace>
>> <qry:result estimate="0"/>
>> </qry:query-plan>
>>
>> On 19 Nov 2013, at 15:05 , Will Thompson <[email protected]> wrote:
>>
>>> I narrowed down the problem to 3+ word phrases. With that hunch, I enabled
>>> word positions, and after reindexing the estimates are now correct.
>>>
>>> I was thinking, incorrectly, that estimates would still be accurate with
>>> only fast phrase searches (and not word positions) enabled. But now that I
>>> look back at how that works, it’s clear that would only be true of 2-word
>>> phrases.
>>>
>>> -Will
>>>
>>>
>>> On Nov 19, 2013, at 3:23 PM, Michael Blakeley <[email protected]> wrote:
>>>
>>>> Which release is this? Is the problem limited to a particular word? If so,
>>>> what words?
>>>>
>>>> Have you tried a query trace or xdmp:plan yet? If you can run that with
>>>> ML7 that is even more useful.
>>>>
>>>> -- Mike
>>>>
>>>> On 19 Nov 2013, at 12:43 , Will Thompson <[email protected]>
>>>> wrote:
>>>>
>>>>> I’m trying to determine why some search result estimates are overcounted.
>>>>> Documents generally look like:
>>>>>
>>>>> <chapter>
>>>>> <subchapter>
>>>>> <doc>
>>>>> <section>
>>>>>
>>>>> Fragment root is set on <doc> (and no ancestors or descendants of <doc>).
>>>>> count(//doc) = xdmp:estimate(//doc) => true. The searchable expression is
>>>>> xdmp:directory((‘dir1’, ‘dir2’, …), ‘infinity’)//doc. The word query
>>>>> specification explicitly includes <doc> and excludes document root.
>>>>>
>>>>> The documentation suggests to prevent overcounting we just ensure that 1)
>>>>> searchable expressions always select a fragment, and 2) there are no
>>>>> predicates applied to the searchable expression. Are there any other
>>>>> conditions that may cause overcounting of a simple word query?
>>>>>
>>>>> -Will
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general