Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API

Geert Josten Tue, 18 Oct 2011 23:04:56 -0700

Right, ofcourse. I was not paying attention, not thinking of facet counts, but 
search result counts (for pagination and such). I reccon that if you want the 
latter to match returned results as well, you would need fragmentation on 
'cite', provided you would really be showing cites individually, and not just 
documents that happen to contain a matching cite.. ;-)


Thnx,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected] 
[mailto:[email protected]] Namens Michael Blakeley
Verzonden: woensdag 19 oktober 2011 7:58
Aan: General MarkLogic Developer Discussion
CC: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
different searchable-expression in Search API

Well, I haven't looked at the search API code lately, but I presume it is using 
cts:frequency, since that is the most efficient way to get counts from a range 
index. The reason to use different QNames is because they can be tied to 
different range indexes. For example, if I had a /doc/country element and a 
/doc//cite/country element, with a range index on country, I would always get 
facets based on the entire document. If I have a document that was published in 
UK but cited an article from FR, both would show up in the facets. The range 
index for a QName contains every value for that QName.

But if I have a /doc/country element and a /doc//cite/cite-country element, I 
can build a range index on each and query them separately. So I can see 
"published in XX" separate from "cites articles published in XX". I can also 
see both together if I wish, because cts:element-values allows a sequence of  
QNames. Essentially I am choosing QNames to tell the database what to index.

Naturally there would be even more flexibility if we could create range indexes 
based on simple XPath expressions as well as QNames. But the existing 
functionality is quite powerful, and enriching existing XML with expressive 
QNames works well for most applications.

-- Mike

On Oct 18, 2011, at 22:30, Geert Josten <[email protected]> wrote:

> Hi Mike,
> 
> In what way does selecting a different range index influence the counts in 
> this case? I'd say you are still selecting the same doc fragments, so I'd 
> expect the counts to not change at all. Am I overlooking something? Or is the 
> search:search libray really using count, and not the fragment-based 
> xdmp:estimate?
> 
> Kind regards,
> Geert
> 
> -----Oorspronkelijk bericht-----
> Van: [email protected] 
> [mailto:[email protected]] Namens Michael Blakeley
> Verzonden: woensdag 19 oktober 2011 2:31
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
> different searchable-expression in Search API
> 
> Will, if I can jump in.... I think your idea of using different QNames is the 
> right way to look at it.
> 
> Facets are built from range indexes, and range indexes contain lists of 
> values and fragment ids for a given QName. So if the query matches the 
> fragment, the facet will show all the values in that fragment. In your case 
> the fragment is the entire document, so you will see all the values in the 
> matching documents, whether they occur under /doc or under /doc//cite. Now, 
> you *could* create a fragment root on 'cite', but I think that would be 
> counter-productive. It's better to use different QNames and have different 
> range indexes.
> 
> So I think what you'd want to do is simply arrange for a different set of 
> search options for doc vs cite, including both searchable expression and 
> constraints. Testing for that could be as simple as a call to 
> cts:contains($user-search, 'select:cite') before you call search:search(). Or 
> if that might generate false positives, you could search:parse the user query 
> and then look at the cts:query XML to see whether or not the parser found a 
> select:cite term. If it did, then you can switch to the correct options 
> before calling search:resolve.
> 
> -- Mike
> 
> On 18 Oct 2011, at 17:14 , Will Thompson wrote:
> 
>> Micah,
>> 
>> I think I may have explained poorly. This is essentially what I'm doing -- 
>> Docs are, generally, like this:
>> 
>> <doc>
>> <search-meta/>
>> <p>...<cite><search-meta/></cite>...</p>
>> <section>
>>  <p>...<cite><search-meta/></cite>...</p>
>>  ...
>> </section>
>> </doc>
>> 
>> Searches operate over //doc by default, but if you add the operator/state 
>> "select:cite" it changes the searchable expression to //cite. The results 
>> are correct, but the problem is that the facet counts appear to be for 
>> *both* doc and cite metadata, and thus do not change when toggling 
>> searchable-expressions via operator/state.
>> 
>> This won't make any sense to our users, who will expect the facet counts to 
>> match what they think they're searching for.
>> 
>> -W
>> 
>> 
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of Micah Dubinko
>> Sent: Tuesday, October 18, 2011 6:56 PM
>> To: General MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] How to get different facet counts for 
>> different searchable-expression in Search API
>> 
>> Hi Will,
>> 
>> Everything you want to search exists in document fragments (not properties) 
>> right?
>> 
>> What would happen if you switched in a different searchable-expression via 
>> operator and state? The combined query is taken into account by faceting, 
>> but the searchable-expression is not.
>> 
>> -m
>> 
>> 
>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote:
>> 
>>> Our app has typically searched only document-type elements, but I recently 
>>> added metadata to citation elements (contained within and scattered about 
>>> document elements) so that they can be optionally searched using an 
>>> operator. i.e.: "term1 term2 select:citations" The operator changes the 
>>> searchable-expression and transform-results to search only citation 
>>> elements and return citation-specific snippets.
>>> 
>>> However, I need the facet counts to reflect the search being performed - 
>>> i.e.: only show estimates for document element direct-child metadata during 
>>> normal search, and only for citations when that is toggled using the 
>>> operator. 
>>> 
>>> My first thought was to use different names or namespace for the citation 
>>> metadata and have the operator toggle a separate set of constraints 
>>> associated with those names. But constraints are not supported children of 
>>> search:state under search:operator.
> 
>>> 
>>> Any ideas on how to accomplish this with Search API? 
>>> 
>>> Thanks!
>>> 
>>> -Will
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API

Reply via email to