Accurate estimates mean accurate search result counts, which make pagination
easier.
Are you looking for an example of how element position indexes could help with
queries?
cts:element-query(
xs:QName('foo'),
cts:element-value-query(xs:QName('bar'), 'baz')).
Without element positions, the indexes record which documents have element foo,
which have bar=baz, and which have foo//bar. But there might be matches for
bar=baz that aren't inside foo. If so, filtering would have to throw out some
potential matches, and xdmp:estimate or cts:remainder would be higher than the
actual count. With positions, the indexes have enough information to show
whether or not the bar=baz is inside an ancestor foo, so the estimate should
match the count exactly.
Of course everything is a trade-off. Recording that position information takes
extra CPU and disk space, and resolving position information at query time also
uses resources. So I'd still rather use expressive QNames whenever I can.
-- Mike
On 18 Oct 2011, at 23:32 , Geert Josten wrote:
> Sorry, how do element word positions help with pagination?
>
> Different QNames for different meaning is definitely a GOOD THING, no doubt.
> :)
>
> Kind regards,
> Geert
>
> -----Oorspronkelijk bericht-----
> Van: [email protected]
> [mailto:[email protected]] Namens Michael Blakeley
> Verzonden: woensdag 19 oktober 2011 8:13
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for
> different searchable-expression in Search API
>
> There too you could use different QNames: if some QNames only occur as
> descendants of cite, then there is no ambiguity. From a storage point of
> view, adding QNames is almost free.
>
> Element word positions could also help. If you wrap the user query in an
> element-query on cite, element word positions can be used to figure out which
> documents actually match. Position indexes are somewhat expensive, but in
> most cases I think that would be cheaper than having a dozen small fragments
> per document.
>
> -- Mike
>
> On Oct 18, 2011, at 23:04, Geert Josten <[email protected]> wrote:
>
>> Right, ofcourse. I was not paying attention, not thinking of facet counts,
>> but search result counts (for pagination and such). I reccon that if you
>> want the latter to match returned results as well, you would need
>> fragmentation on 'cite', provided you would really be showing cites
>> individually, and not just documents that happen to contain a matching
>> cite.. ;-)
>>
>> Thnx,
>> Geert
>>
>> -----Oorspronkelijk bericht-----
>> Van: [email protected]
>> [mailto:[email protected]] Namens Michael Blakeley
>> Verzonden: woensdag 19 oktober 2011 7:58
>> Aan: General MarkLogic Developer Discussion
>> CC: General MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for
>> different searchable-expression in Search API
>>
>> Well, I haven't looked at the search API code lately, but I presume it is
>> using cts:frequency, since that is the most efficient way to get counts from
>> a range index. The reason to use different QNames is because they can be
>> tied to different range indexes. For example, if I had a /doc/country
>> element and a /doc//cite/country element, with a range index on country, I
>> would always get facets based on the entire document. If I have a document
>> that was published in UK but cited an article from FR, both would show up in
>> the facets. The range index for a QName contains every value for that QName.
>>
>> But if I have a /doc/country element and a /doc//cite/cite-country element,
>> I can build a range index on each and query them separately. So I can see
>> "published in XX" separate from "cites articles published in XX". I can also
>> see both together if I wish, because cts:element-values allows a sequence of
>> QNames. Essentially I am choosing QNames to tell the database what to index.
>>
>> Naturally there would be even more flexibility if we could create range
>> indexes based on simple XPath expressions as well as QNames. But the
>> existing functionality is quite powerful, and enriching existing XML with
>> expressive QNames works well for most applications.
>>
>> -- Mike
>>
>> On Oct 18, 2011, at 22:30, Geert Josten <[email protected]> wrote:
>>
>>> Hi Mike,
>>>
>>> In what way does selecting a different range index influence the counts in
>>> this case? I'd say you are still selecting the same doc fragments, so I'd
>>> expect the counts to not change at all. Am I overlooking something? Or is
>>> the search:search libray really using count, and not the fragment-based
>>> xdmp:estimate?
>>>
>>> Kind regards,
>>> Geert
>>>
>>> -----Oorspronkelijk bericht-----
>>> Van: [email protected]
>>> [mailto:[email protected]] Namens Michael Blakeley
>>> Verzonden: woensdag 19 oktober 2011 2:31
>>> Aan: General MarkLogic Developer Discussion
>>> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts
>>> for different searchable-expression in Search API
>>>
>>> Will, if I can jump in.... I think your idea of using different QNames is
>>> the right way to look at it.
>>>
>>> Facets are built from range indexes, and range indexes contain lists of
>>> values and fragment ids for a given QName. So if the query matches the
>>> fragment, the facet will show all the values in that fragment. In your case
>>> the fragment is the entire document, so you will see all the values in the
>>> matching documents, whether they occur under /doc or under /doc//cite. Now,
>>> you *could* create a fragment root on 'cite', but I think that would be
>>> counter-productive. It's better to use different QNames and have different
>>> range indexes.
>>>
>>> So I think what you'd want to do is simply arrange for a different set of
>>> search options for doc vs cite, including both searchable expression and
>>> constraints. Testing for that could be as simple as a call to
>>> cts:contains($user-search, 'select:cite') before you call search:search().
>>> Or if that might generate false positives, you could search:parse the user
>>> query and then look at the cts:query XML to see whether or not the parser
>>> found a select:cite term. If it did, then you can switch to the correct
>>> options before calling search:resolve.
>>>
>>> -- Mike
>>>
>>> On 18 Oct 2011, at 17:14 , Will Thompson wrote:
>>>
>>>> Micah,
>>>>
>>>> I think I may have explained poorly. This is essentially what I'm doing --
>>>> Docs are, generally, like this:
>>>>
>>>> <doc>
>>>> <search-meta/>
>>>> <p>...<cite><search-meta/></cite>...</p>
>>>> <section>
>>>> <p>...<cite><search-meta/></cite>...</p>
>>>> ...
>>>> </section>
>>>> </doc>
>>>>
>>>> Searches operate over //doc by default, but if you add the operator/state
>>>> "select:cite" it changes the searchable expression to //cite. The results
>>>> are correct, but the problem is that the facet counts appear to be for
>>>> *both* doc and cite metadata, and thus do not change when toggling
>>>> searchable-expressions via operator/state.
>>>>
>>>> This won't make any sense to our users, who will expect the facet counts
>>>> to match what they think they're searching for.
>>>>
>>>> -W
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Micah Dubinko
>>>> Sent: Tuesday, October 18, 2011 6:56 PM
>>>> To: General MarkLogic Developer Discussion
>>>> Subject: Re: [MarkLogic Dev General] How to get different facet counts for
>>>> different searchable-expression in Search API
>>>>
>>>> Hi Will,
>>>>
>>>> Everything you want to search exists in document fragments (not
>>>> properties) right?
>>>>
>>>> What would happen if you switched in a different searchable-expression via
>>>> operator and state? The combined query is taken into account by faceting,
>>>> but the searchable-expression is not.
>>>>
>>>> -m
>>>>
>>>>
>>>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote:
>>>>
>>>>> Our app has typically searched only document-type elements, but I
>>>>> recently added metadata to citation elements (contained within and
>>>>> scattered about document elements) so that they can be optionally
>>>>> searched using an operator. i.e.: "term1 term2 select:citations" The
>>>>> operator changes the searchable-expression and transform-results to
>>>>> search only citation elements and return citation-specific snippets.
>>>>>
>>>>> However, I need the facet counts to reflect the search being performed -
>>>>> i.e.: only show estimates for document element direct-child metadata
>>>>> during normal search, and only for citations when that is toggled using
>>>>> the operator.
>>>>>
>>>>> My first thought was to use different names or namespace for the citation
>>>>> metadata and have the operator toggle a separate set of constraints
>>>>> associated with those names. But constraints are not supported children
>>>>> of search:state under search:operator.
>>>
>>>>>
>>>>> Any ideas on how to accomplish this with Search API?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -Will
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general