Hi Greg, I think Mike answered your question why not to use fragments elaborately. The sample XML your sent shows you are searching on the METS:dmdSec element, which indeed resembles a record-like piece of XML. Your facets are focused on sub-elements of that. So that would be a good argument to proceed the way Mike suggests: put those elements in the database as separate documents.
Before you do so you could run a quick test. Add METS:dmdSec as a fragment root to the docs database of the app server you are using. You do so by going to the admin interface (http://localhost:8001/), go to the databases, find the docs database of your app server, open the Fragment Roots section, and add your element by supplying its namespace-uri and local-name. This will trigger a reindex. Follow the reindex in the database status tab (accessible from the database general properties page). Once the reindex is done, check your facets and search results. If this gives the results you are looking for, then really consider doing as Mike suggests: reload your content, and store METS:dmdSec as separate documents. Reloading and splitting your content at that element does mean that including search facets on the header element that is located above these elements becomes difficult, but that is also the case when using fragmentation as Mike points out below. If you only need to be able to access it for presentation or other purposes, you're find I'd say.. Kind regards, Geert -----Oorspronkelijk bericht----- Van: [email protected] [mailto:[email protected]] Namens Michael Blakeley Verzonden: donderdag 10 november 2011 18:27 Aan: General MarkLogic Developer Discussion Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts for different searchable-expression in Search API I wouldn't jump into setting fragment roots. Fragment rules are a mistake in at least 80% of applications. If you fragment your documents, you won't be able to search them as documents very easily. Any query that crosses fragment boundaries has to be implemented as some sort of join, and the server doesn't do much of anything for you in those cases. So if you have 'head' and 'body', like xhtml, and you fragment on 'head'... now the search API can't help you with searches constraints that check both head and body. Aside from that, adding fragments increases both memory utilization and disk utilization. But usually the effect on queries and indexing is paramount. When should you use fragments? Mostly for large documents that cannot be broken apart for intrinsic reasons. Large means that the typical tree size is larger than system's on-die cache size. "Cannot be broken apart" means "cannot", not "that would resemble work". For example, an RDBMS will often export a table as a giant document with row-oriented child elements. Don't fragment that. Break it up. This is a document-oriented environment, so map each row to a document. Books might seem like a good candidate for fragmentation, but not always. Often it's better to represent a book as a directory, with metadata in a manifest and each chapter in a document. Most users will want to search at the chapter level or lower anyway. Getting back to facets and searchable expressions, my answer is the same as below. In most cases you'll have a finite set of searchable expressions that interest you. So use QNames that express that. For example, you might have 'tag' in head and 'tag' in body. Change that by using different local names ('head-tag' vs 'body-tag') or namespaces ('h:tag' vs 'b:tag'). -- Mike On 10 Nov 2011, at 08:25 , Murray, Gregory wrote: > Geert, > > I don't know how to set an element as a fragment root, which I assume means that the element/fragment level becomes the bases for indexing, rather than the document level. That sounds like exactly what I need. Which part of the documentation discusses that? I'm not finding it. > > When you say "big impact" do you mean a drag on performance? > > Thanks, > Greg > > > On Nov 10, 2011, at 9:11 AM, Geert Josten wrote: > >> Hi Greg, >> >> To my knowledge it is like you say: facet counts are based on fragments, >> not on search results. But the lengthy explanation by Mike (over several >> mails) confused me a bit. I still need to reread it thoroughly. >> >> One solution for sure is to cancel the difference between what is matched >> using the searchable-expression and what is stored as separate fragment. >> You can do that by declaring the element that you search for as a fragment >> root. Depending on the occurrence of that element within each document, >> this could have big impact, so this might not be the most wise decision. >> Just mentioning it as a possible option.. >> >> Kind regards, >> Geert >> >> -----Oorspronkelijk bericht----- >> Van: [email protected] >> [mailto:[email protected]] Namens Murray, Gregory >> Verzonden: donderdag 10 november 2011 14:45 >> Aan: General MarkLogic Developer Discussion >> Onderwerp: Re: [MarkLogic Dev General] How to get different facet counts >> for different searchable-expression in Search API >> >> I should have mentioned that I'm using 4.2-1 >> >> Any suggestions greatly appreciated. >> >> Thanks, >> Greg >> >> On Nov 9, 2011, at 5:21 PM, Murray, Gregory wrote: >> >>> I'm having a similar problem with facet counts when using >> <searchable-expression>. After reading this thread, I'm afraid I still >> don't understand how to circumvent the problem. When using >> <searchable-expression>, it appears that the search results are >> constrained to that expression whereas the facet counts are not. Is there >> a facet-related option to similarly constrain a facet to an XPath >> expression? I've seen references to the "fragment-frequency" option, but >> appears to have no effect in this context. >>> >>> Many thanks, >>> Greg >>> >>> Gregory Murray >>> Digital Library Application Developer >>> Princeton Theological Seminary >>> >>> >>> On Oct 18, 2011, at 8:30 PM, Michael Blakeley wrote: >>> >>>> Will, if I can jump in.... I think your idea of using different QNames >> is the right way to look at it. >>>> >>>> Facets are built from range indexes, and range indexes contain lists of >> values and fragment ids for a given QName. So if the query matches the >> fragment, the facet will show all the values in that fragment. In your >> case the fragment is the entire document, so you will see all the values >> in the matching documents, whether they occur under /doc or under >> /doc//cite. Now, you *could* create a fragment root on 'cite', but I think >> that would be counter-productive. It's better to use different QNames and >> have different range indexes. >>>> >>>> So I think what you'd want to do is simply arrange for a different set >> of search options for doc vs cite, including both searchable expression >> and constraints. Testing for that could be as simple as a call to >> cts:contains($user-search, 'select:cite') before you call search:search(). >> Or if that might generate false positives, you could search:parse the user >> query and then look at the cts:query XML to see whether or not the parser >> found a select:cite term. If it did, then you can switch to the correct >> options before calling search:resolve. >>>> >>>> -- Mike >>>> >>>> On 18 Oct 2011, at 17:14 , Will Thompson wrote: >>>> >>>>> Micah, >>>>> >>>>> I think I may have explained poorly. This is essentially what I'm >> doing -- Docs are, generally, like this: >>>>> >>>>> <doc> >>>>> <search-meta/> >>>>> <p>...<cite><search-meta/></cite>...</p> >>>>> <section> >>>>> <p>...<cite><search-meta/></cite>...</p> >>>>> ... >>>>> </section> >>>>> </doc> >>>>> >>>>> Searches operate over //doc by default, but if you add the >> operator/state "select:cite" it changes the searchable expression to >> //cite. The results are correct, but the problem is that the facet counts >> appear to be for *both* doc and cite metadata, and thus do not change when >> toggling searchable-expressions via operator/state. >>>>> >>>>> This won't make any sense to our users, who will expect the facet >> counts to match what they think they're searching for. >>>>> >>>>> -W >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] >> [mailto:[email protected]] On Behalf Of Micah >> Dubinko >>>>> Sent: Tuesday, October 18, 2011 6:56 PM >>>>> To: General MarkLogic Developer Discussion >>>>> Subject: Re: [MarkLogic Dev General] How to get different facet counts >> for different searchable-expression in Search API >>>>> >>>>> Hi Will, >>>>> >>>>> Everything you want to search exists in document fragments (not >> properties) right? >>>>> >>>>> What would happen if you switched in a different searchable-expression >> via operator and state? The combined query is taken into account by >> faceting, but the searchable-expression is not. >>>>> >>>>> -m >>>>> >>>>> >>>>> On Oct 18, 2011, at 4:42 PM, Will Thompson wrote: >>>>> >>>>>> Our app has typically searched only document-type elements, but I >> recently added metadata to citation elements (contained within and >> scattered about document elements) so that they can be optionally searched >> using an operator. i.e.: "term1 term2 select:citations" The operator >> changes the searchable-expression and transform-results to search only >> citation elements and return citation-specific snippets. >>>>>> >>>>>> However, I need the facet counts to reflect the search being >> performed - i.e.: only show estimates for document element direct-child >> metadata during normal search, and only for citations when that is toggled >> using the operator. >>>>>> >>>>>> My first thought was to use different names or namespace for the >> citation metadata and have the operator toggle a separate set of >> constraints associated with those names. But constraints are not supported >> children of search:state under search:operator. >>>>>> >>>>>> Any ideas on how to accomplish this with Search API? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> -Will >>>>>> >>>>>> _______________________________________________ >>>>>> General mailing list >>>>>> [email protected] >>>>>> http://developer.marklogic.com/mailman/listinfo/general >>>>> >>>>> _______________________________________________ >>>>> General mailing list >>>>> [email protected] >>>>> http://developer.marklogic.com/mailman/listinfo/general >>>>> _______________________________________________ >>>>> General mailing list >>>>> [email protected] >>>>> http://developer.marklogic.com/mailman/listinfo/general >>>>> >>>> >>>> _______________________________________________ >>>> General mailing list >>>> [email protected] >>>> http://developer.marklogic.com/mailman/listinfo/general >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
