Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API

Geert Josten Thu, 10 Nov 2011 09:54:48 -0800

Hi Greg,

I think Mike answered your question why not to use fragments elaborately.
The sample XML your sent shows you are searching on the METS:dmdSec
element, which indeed resembles a record-like piece of XML. Your facets
are focused on sub-elements of that. So that would be a good argument to
proceed the way Mike suggests: put those elements in the database as
separate documents.


Before you do so you could run a quick test. Add METS:dmdSec as a fragment
root to the docs database of the app server you are using. You do so by
going to the admin interface (http://localhost:8001/), go to the
databases, find the docs database of your app server, open the Fragment
Roots section, and add your element by supplying its namespace-uri and
local-name. This will trigger a reindex. Follow the reindex in the
database status tab (accessible from the database general properties
page). Once the reindex is done, check your facets and search results. If
this gives the results you are looking for, then really consider doing as
Mike suggests: reload your content, and store METS:dmdSec as separate
documents.

Reloading and splitting your content at that element does mean that
including search facets on the header element that is located above these
elements becomes difficult, but that is also the case when using
fragmentation as Mike points out below. If you only need to be able to
access it for presentation or other purposes, you're find I'd say..

Kind regards,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected]
[mailto:[email protected]] Namens Michael Blakeley
Verzonden: donderdag 10 november 2011 18:27
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts
for different searchable-expression in Search API

I wouldn't jump into setting fragment roots. Fragment rules are a mistake
in at least 80% of applications.

If you fragment your documents, you won't be able to search them as
documents very easily. Any query that crosses fragment boundaries has to
be implemented as some sort of join, and the server doesn't do much of
anything for you in those cases. So if you have 'head' and 'body', like
xhtml, and you fragment on 'head'... now the search API can't help you
with searches constraints that check both head and body.

Aside from that, adding fragments increases both memory utilization and
disk utilization. But usually the effect on queries and indexing is
paramount.

When should you use fragments? Mostly for large documents that cannot be
broken apart for intrinsic reasons. Large means that the typical tree size
is larger than system's on-die cache size. "Cannot be broken apart" means
"cannot", not "that would resemble work".

For example, an RDBMS will often export a table as a giant document with
row-oriented child elements. Don't fragment that. Break it up. This is a
document-oriented environment, so map each row to a document.

Books might seem like a good candidate for fragmentation, but not always.
Often it's better to represent a book as a directory, with metadata in a
manifest and each chapter in a document. Most users will want to search at
the chapter level or lower anyway.

Getting back to facets and searchable expressions, my answer is the same
as below. In most cases you'll have a finite set of searchable expressions
that interest you. So use QNames that express that. For example, you might
have 'tag' in head and 'tag' in body. Change that by using different local
names ('head-tag' vs 'body-tag') or namespaces ('h:tag' vs 'b:tag').

-- Mike

On 10 Nov 2011, at 08:25 , Murray, Gregory wrote:

> Geert,
>
> I don't know how to set an element as a fragment root, which I assume
means that the element/fragment level becomes the bases for indexing,
rather than the document level. That sounds like exactly what I need.
Which part of the documentation discusses that? I'm not finding it.
>
> When you say "big impact" do you mean a drag on performance?
>
> Thanks,
> Greg
>
>
> On Nov 10, 2011, at 9:11 AM, Geert Josten wrote:
>
>> Hi Greg,
>>
>> To my knowledge it is like you say: facet counts are based on
fragments,
>> not on search results. But the lengthy explanation by Mike (over
several
>> mails) confused me a bit. I still need to reread it thoroughly.
>>
>> One solution for sure is to cancel the difference between what is
matched
>> using the searchable-expression and what is stored as separate
fragment.
>> You can do that by declaring the element that you search for as a
fragment
>> root. Depending on the occurrence of that element within each document,
>> this could have big impact, so this might not be the most wise
decision.
>> Just mentioning it as a possible option..
>>
>> Kind regards,
>> Geert
>>
>> -----Oorspronkelijk bericht-----
>> Van: [email protected]
>> [mailto:[email protected]] Namens Murray, Gregory
>> Verzonden: donderdag 10 november 2011 14:45
>> Aan: General MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] How to get different facet
counts
>> for different searchable-expression in Search API
>>
>> I should have mentioned that I'm using 4.2-1
>>
>> Any suggestions greatly appreciated.
>>
>> Thanks,
>> Greg
>>
>> On Nov 9, 2011, at 5:21 PM, Murray, Gregory wrote:
>>
>>> I'm having a similar problem with facet counts when using
>> <searchable-expression>. After reading this thread, I'm afraid I still
>> don't understand how to circumvent the problem. When using
>> <searchable-expression>, it appears that the search results are
>> constrained to that expression whereas the facet counts are not. Is
there
>> a facet-related option to similarly constrain a facet to an XPath
>> expression? I've seen references to the "fragment-frequency" option,
but
>> appears to have no effect in this context.
>>>
>>> Many thanks,
>>> Greg
>>>
>>> Gregory Murray
>>> Digital Library Application Developer
>>> Princeton Theological Seminary
>>>
>>>
>>> On Oct 18, 2011, at 8:30 PM, Michael Blakeley wrote:
>>>
>>>> Will, if I can jump in.... I think your idea of using different
QNames
>> is the right way to look at it.
>>>>
>>>> Facets are built from range indexes, and range indexes contain lists
of
>> values and fragment ids for a given QName. So if the query matches the
>> fragment, the facet will show all the values in that fragment. In your
>> case the fragment is the entire document, so you will see all the
values
>> in the matching documents, whether they occur under /doc or under
>> /doc//cite. Now, you *could* create a fragment root on 'cite', but I
think
>> that would be counter-productive. It's better to use different QNames
and
>> have different range indexes.
>>>>
>>>> So I think what you'd want to do is simply arrange for a different
set
>> of search options for doc vs cite, including both searchable expression
>> and constraints. Testing for that could be as simple as a call to
>> cts:contains($user-search, 'select:cite') before you call
search:search().
>> Or if that might generate false positives, you could search:parse the
user
>> query and then look at the cts:query XML to see whether or not the
parser
>> found a select:cite term. If it did, then you can switch to the correct
>> options before calling search:resolve.
>>>>
>>>> -- Mike
>>>>
>>>> On 18 Oct 2011, at 17:14 , Will Thompson wrote:
>>>>
>>>>> Micah,
>>>>>
>>>>> I think I may have explained poorly. This is essentially what I'm
>> doing -- Docs are, generally, like this:
>>>>>
>>>>> <doc>
>>>>> <search-meta/>
>>>>> <p>...<cite><search-meta/></cite>...</p>
>>>>> <section>
>>>>> <p>...<cite><search-meta/></cite>...</p>
>>>>> ...
>>>>> </section>
>>>>> </doc>
>>>>>
>>>>> Searches operate over //doc by default, but if you add the
>> operator/state "select:cite" it changes the searchable expression to
>> //cite. The results are correct, but the problem is that the facet
counts
>> appear to be for *both* doc and cite metadata, and thus do not change
when
>> toggling searchable-expressions via operator/state.
>>>>>
>>>>> This won't make any sense to our users, who will expect the facet
>> counts to match what they think they're searching for.
>>>>>
>>>>> -W
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Micah
>> Dubinko
>>>>> Sent: Tuesday, October 18, 2011 6:56 PM
>>>>> To: General MarkLogic Developer Discussion
>>>>> Subject: Re: [MarkLogic Dev General] How to get different facet
counts
>> for different searchable-expression in Search API
>>>>>
>>>>> Hi Will,
>>>>>
>>>>> Everything you want to search exists in document fragments (not
>> properties) right?
>>>>>
>>>>> What would happen if you switched in a different
searchable-expression
>> via operator and state? The combined query is taken into account by
>> faceting, but the searchable-expression is not.
>>>>>
>>>>> -m
>>>>>
>>>>>
>>>>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote:
>>>>>
>>>>>> Our app has typically searched only document-type elements, but I
>> recently added metadata to citation elements (contained within and
>> scattered about document elements) so that they can be optionally
searched
>> using an operator. i.e.: "term1 term2 select:citations" The operator
>> changes the searchable-expression and transform-results to search only
>> citation elements and return citation-specific snippets.
>>>>>>
>>>>>> However, I need the facet counts to reflect the search being
>> performed - i.e.: only show estimates for document element direct-child
>> metadata during normal search, and only for citations when that is
toggled
>> using the operator.
>>>>>>
>>>>>> My first thought was to use different names or namespace for the
>> citation metadata and have the operator toggle a separate set of
>> constraints associated with those names. But constraints are not
supported
>> children of search:state under search:operator.
>>>>>>
>>>>>> Any ideas on how to accomplish this with Search API?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Will
>>>>>>
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API

Reply via email to