Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API

Michael Blakeley Wed, 19 Oct 2011 09:06:22 -0700

Accurate estimates mean accurate search result counts, which make pagination 
easier.


Are you looking for an example of how element position indexes could help with 
queries?

  cts:element-query(
    xs:QName('foo'),
    cts:element-value-query(xs:QName('bar'), 'baz')).

Without element positions, the indexes record which documents have element foo, 
which have bar=baz, and which have foo//bar. But there might be matches for 
bar=baz that aren't inside foo. If so, filtering would have to throw out some 
potential matches, and xdmp:estimate or cts:remainder would be higher than the 
actual count. With positions, the indexes have enough information to show 
whether or not the bar=baz is inside an ancestor foo, so the estimate should 
match the count exactly.

Of course everything is a trade-off. Recording that position information takes 
extra CPU and disk space, and resolving position information at query time also 
uses resources. So I'd still rather use expressive QNames whenever I can.

-- Mike 

On 18 Oct 2011, at 23:32 , Geert Josten wrote:

> Sorry, how do element word positions help with pagination?
> 
> Different QNames for different meaning is definitely a GOOD THING, no doubt. 
> :)
> 
> Kind regards,
> Geert
> 
> -----Oorspronkelijk bericht-----
> Van: [email protected] 
> [mailto:[email protected]] Namens Michael Blakeley
> Verzonden: woensdag 19 oktober 2011 8:13
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
> different searchable-expression in Search API
> 
> There too you could use different QNames: if some QNames only occur as 
> descendants of cite, then there is no ambiguity. From a storage point of 
> view, adding QNames is almost free.
> 
> Element word positions could also help. If you wrap the user query in an 
> element-query on cite, element word positions can be used to figure out which 
> documents actually match. Position indexes are somewhat expensive, but in 
> most cases I think that would be cheaper than having a dozen small fragments 
> per document.
> 
> -- Mike
> 
> On Oct 18, 2011, at 23:04, Geert Josten <[email protected]> wrote:
> 
>> Right, ofcourse. I was not paying attention, not thinking of facet counts, 
>> but search result counts (for pagination and such). I reccon that if you 
>> want the latter to match returned results as well, you would need 
>> fragmentation on 'cite', provided you would really be showing cites 
>> individually, and not just documents that happen to contain a matching 
>> cite.. ;-)
>> 
>> Thnx,
>> Geert
>> 
>> -----Oorspronkelijk bericht-----
>> Van: [email protected] 
>> [mailto:[email protected]] Namens Michael Blakeley
>> Verzonden: woensdag 19 oktober 2011 7:58
>> Aan: General MarkLogic Developer Discussion
>> CC: General MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
>> different searchable-expression in Search API
>> 
>> Well, I haven't looked at the search API code lately, but I presume it is 
>> using cts:frequency, since that is the most efficient way to get counts from 
>> a range index. The reason to use different QNames is because they can be 
>> tied to different range indexes. For example, if I had a /doc/country 
>> element and a /doc//cite/country element, with a range index on country, I 
>> would always get facets based on the entire document. If I have a document 
>> that was published in UK but cited an article from FR, both would show up in 
>> the facets. The range index for a QName contains every value for that QName.
>> 
>> But if I have a /doc/country element and a /doc//cite/cite-country element, 
>> I can build a range index on each and query them separately. So I can see 
>> "published in XX" separate from "cites articles published in XX". I can also 
>> see both together if I wish, because cts:element-values allows a sequence of 
>>  QNames. Essentially I am choosing QNames to tell the database what to index.
>> 
>> Naturally there would be even more flexibility if we could create range 
>> indexes based on simple XPath expressions as well as QNames. But the 
>> existing functionality is quite powerful, and enriching existing XML with 
>> expressive QNames works well for most applications.
>> 
>> -- Mike
>> 
>> On Oct 18, 2011, at 22:30, Geert Josten <[email protected]> wrote:
>> 
>>> Hi Mike,
>>> 
>>> In what way does selecting a different range index influence the counts in 
>>> this case? I'd say you are still selecting the same doc fragments, so I'd 
>>> expect the counts to not change at all. Am I overlooking something? Or is 
>>> the search:search libray really using count, and not the fragment-based 
>>> xdmp:estimate?
>>> 
>>> Kind regards,
>>> Geert
>>> 
>>> -----Oorspronkelijk bericht-----
>>> Van: [email protected] 
>>> [mailto:[email protected]] Namens Michael Blakeley
>>> Verzonden: woensdag 19 oktober 2011 2:31
>>> Aan: General MarkLogic Developer Discussion
>>> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts 
>>> for different searchable-expression in Search API
>>> 
>>> Will, if I can jump in.... I think your idea of using different QNames is 
>>> the right way to look at it.
>>> 
>>> Facets are built from range indexes, and range indexes contain lists of 
>>> values and fragment ids for a given QName. So if the query matches the 
>>> fragment, the facet will show all the values in that fragment. In your case 
>>> the fragment is the entire document, so you will see all the values in the 
>>> matching documents, whether they occur under /doc or under /doc//cite. Now, 
>>> you *could* create a fragment root on 'cite', but I think that would be 
>>> counter-productive. It's better to use different QNames and have different 
>>> range indexes.
>>> 
>>> So I think what you'd want to do is simply arrange for a different set of 
>>> search options for doc vs cite, including both searchable expression and 
>>> constraints. Testing for that could be as simple as a call to 
>>> cts:contains($user-search, 'select:cite') before you call search:search(). 
>>> Or if that might generate false positives, you could search:parse the user 
>>> query and then look at the cts:query XML to see whether or not the parser 
>>> found a select:cite term. If it did, then you can switch to the correct 
>>> options before calling search:resolve.
>>> 
>>> -- Mike
>>> 
>>> On 18 Oct 2011, at 17:14 , Will Thompson wrote:
>>> 
>>>> Micah,
>>>> 
>>>> I think I may have explained poorly. This is essentially what I'm doing -- 
>>>> Docs are, generally, like this:
>>>> 
>>>> <doc>
>>>> <search-meta/>
>>>> <p>...<cite><search-meta/></cite>...</p>
>>>> <section>
>>>> <p>...<cite><search-meta/></cite>...</p>
>>>> ...
>>>> </section>
>>>> </doc>
>>>> 
>>>> Searches operate over //doc by default, but if you add the operator/state 
>>>> "select:cite" it changes the searchable expression to //cite. The results 
>>>> are correct, but the problem is that the facet counts appear to be for 
>>>> *both* doc and cite metadata, and thus do not change when toggling 
>>>> searchable-expressions via operator/state.
>>>> 
>>>> This won't make any sense to our users, who will expect the facet counts 
>>>> to match what they think they're searching for.
>>>> 
>>>> -W
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: [email protected] 
>>>> [mailto:[email protected]] On Behalf Of Micah Dubinko
>>>> Sent: Tuesday, October 18, 2011 6:56 PM
>>>> To: General MarkLogic Developer Discussion
>>>> Subject: Re: [MarkLogic Dev General] How to get different facet counts for 
>>>> different searchable-expression in Search API
>>>> 
>>>> Hi Will,
>>>> 
>>>> Everything you want to search exists in document fragments (not 
>>>> properties) right?
>>>> 
>>>> What would happen if you switched in a different searchable-expression via 
>>>> operator and state? The combined query is taken into account by faceting, 
>>>> but the searchable-expression is not.
>>>> 
>>>> -m
>>>> 
>>>> 
>>>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote:
>>>> 
>>>>> Our app has typically searched only document-type elements, but I 
>>>>> recently added metadata to citation elements (contained within and 
>>>>> scattered about document elements) so that they can be optionally 
>>>>> searched using an operator. i.e.: "term1 term2 select:citations" The 
>>>>> operator changes the searchable-expression and transform-results to 
>>>>> search only citation elements and return citation-specific snippets.
>>>>> 
>>>>> However, I need the facet counts to reflect the search being performed - 
>>>>> i.e.: only show estimates for document element direct-child metadata 
>>>>> during normal search, and only for citations when that is toggled using 
>>>>> the operator. 
>>>>> 
>>>>> My first thought was to use different names or namespace for the citation 
>>>>> metadata and have the operator toggle a separate set of constraints 
>>>>> associated with those names. But constraints are not supported children 
>>>>> of search:state under search:operator.
>>> 
>>>>> 
>>>>> Any ideas on how to accomplish this with Search API? 
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> -Will
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API

Reply via email to