Sorry, how do element word positions help with pagination?

Different QNames for different meaning is definitely a GOOD THING, no doubt. :)

Kind regards,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected] 
[mailto:[email protected]] Namens Michael Blakeley
Verzonden: woensdag 19 oktober 2011 8:13
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
different searchable-expression in Search API

There too you could use different QNames: if some QNames only occur as 
descendants of cite, then there is no ambiguity. From a storage point of view, 
adding QNames is almost free.

Element word positions could also help. If you wrap the user query in an 
element-query on cite, element word positions can be used to figure out which 
documents actually match. Position indexes are somewhat expensive, but in most 
cases I think that would be cheaper than having a dozen small fragments per 
document.

-- Mike

On Oct 18, 2011, at 23:04, Geert Josten <[email protected]> wrote:

> Right, ofcourse. I was not paying attention, not thinking of facet counts, 
> but search result counts (for pagination and such). I reccon that if you want 
> the latter to match returned results as well, you would need fragmentation on 
> 'cite', provided you would really be showing cites individually, and not just 
> documents that happen to contain a matching cite.. ;-)
> 
> Thnx,
> Geert
> 
> -----Oorspronkelijk bericht-----
> Van: [email protected] 
> [mailto:[email protected]] Namens Michael Blakeley
> Verzonden: woensdag 19 oktober 2011 7:58
> Aan: General MarkLogic Developer Discussion
> CC: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
> different searchable-expression in Search API
> 
> Well, I haven't looked at the search API code lately, but I presume it is 
> using cts:frequency, since that is the most efficient way to get counts from 
> a range index. The reason to use different QNames is because they can be tied 
> to different range indexes. For example, if I had a /doc/country element and 
> a /doc//cite/country element, with a range index on country, I would always 
> get facets based on the entire document. If I have a document that was 
> published in UK but cited an article from FR, both would show up in the 
> facets. The range index for a QName contains every value for that QName.
> 
> But if I have a /doc/country element and a /doc//cite/cite-country element, I 
> can build a range index on each and query them separately. So I can see 
> "published in XX" separate from "cites articles published in XX". I can also 
> see both together if I wish, because cts:element-values allows a sequence of  
> QNames. Essentially I am choosing QNames to tell the database what to index.
> 
> Naturally there would be even more flexibility if we could create range 
> indexes based on simple XPath expressions as well as QNames. But the existing 
> functionality is quite powerful, and enriching existing XML with expressive 
> QNames works well for most applications.
> 
> -- Mike
> 
> On Oct 18, 2011, at 22:30, Geert Josten <[email protected]> wrote:
> 
>> Hi Mike,
>> 
>> In what way does selecting a different range index influence the counts in 
>> this case? I'd say you are still selecting the same doc fragments, so I'd 
>> expect the counts to not change at all. Am I overlooking something? Or is 
>> the search:search libray really using count, and not the fragment-based 
>> xdmp:estimate?
>> 
>> Kind regards,
>> Geert
>> 
>> -----Oorspronkelijk bericht-----
>> Van: [email protected] 
>> [mailto:[email protected]] Namens Michael Blakeley
>> Verzonden: woensdag 19 oktober 2011 2:31
>> Aan: General MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for 
>> different searchable-expression in Search API
>> 
>> Will, if I can jump in.... I think your idea of using different QNames is 
>> the right way to look at it.
>> 
>> Facets are built from range indexes, and range indexes contain lists of 
>> values and fragment ids for a given QName. So if the query matches the 
>> fragment, the facet will show all the values in that fragment. In your case 
>> the fragment is the entire document, so you will see all the values in the 
>> matching documents, whether they occur under /doc or under /doc//cite. Now, 
>> you *could* create a fragment root on 'cite', but I think that would be 
>> counter-productive. It's better to use different QNames and have different 
>> range indexes.
>> 
>> So I think what you'd want to do is simply arrange for a different set of 
>> search options for doc vs cite, including both searchable expression and 
>> constraints. Testing for that could be as simple as a call to 
>> cts:contains($user-search, 'select:cite') before you call search:search(). 
>> Or if that might generate false positives, you could search:parse the user 
>> query and then look at the cts:query XML to see whether or not the parser 
>> found a select:cite term. If it did, then you can switch to the correct 
>> options before calling search:resolve.
>> 
>> -- Mike
>> 
>> On 18 Oct 2011, at 17:14 , Will Thompson wrote:
>> 
>>> Micah,
>>> 
>>> I think I may have explained poorly. This is essentially what I'm doing -- 
>>> Docs are, generally, like this:
>>> 
>>> <doc>
>>> <search-meta/>
>>> <p>...<cite><search-meta/></cite>...</p>
>>> <section>
>>> <p>...<cite><search-meta/></cite>...</p>
>>> ...
>>> </section>
>>> </doc>
>>> 
>>> Searches operate over //doc by default, but if you add the operator/state 
>>> "select:cite" it changes the searchable expression to //cite. The results 
>>> are correct, but the problem is that the facet counts appear to be for 
>>> *both* doc and cite metadata, and thus do not change when toggling 
>>> searchable-expressions via operator/state.
>>> 
>>> This won't make any sense to our users, who will expect the facet counts to 
>>> match what they think they're searching for.
>>> 
>>> -W
>>> 
>>> 
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:[email protected]] On Behalf Of Micah Dubinko
>>> Sent: Tuesday, October 18, 2011 6:56 PM
>>> To: General MarkLogic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] How to get different facet counts for 
>>> different searchable-expression in Search API
>>> 
>>> Hi Will,
>>> 
>>> Everything you want to search exists in document fragments (not properties) 
>>> right?
>>> 
>>> What would happen if you switched in a different searchable-expression via 
>>> operator and state? The combined query is taken into account by faceting, 
>>> but the searchable-expression is not.
>>> 
>>> -m
>>> 
>>> 
>>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote:
>>> 
>>>> Our app has typically searched only document-type elements, but I recently 
>>>> added metadata to citation elements (contained within and scattered about 
>>>> document elements) so that they can be optionally searched using an 
>>>> operator. i.e.: "term1 term2 select:citations" The operator changes the 
>>>> searchable-expression and transform-results to search only citation 
>>>> elements and return citation-specific snippets.
>>>> 
>>>> However, I need the facet counts to reflect the search being performed - 
>>>> i.e.: only show estimates for document element direct-child metadata 
>>>> during normal search, and only for citations when that is toggled using 
>>>> the operator. 
>>>> 
>>>> My first thought was to use different names or namespace for the citation 
>>>> metadata and have the operator toggle a separate set of constraints 
>>>> associated with those names. But constraints are not supported children of 
>>>> search:state under search:operator.
>> 
>>>> 
>>>> Any ideas on how to accomplish this with Search API? 
>>>> 
>>>> Thanks!
>>>> 
>>>> -Will
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to