You have a few ways to do this, depending on your requirements. Option 1: Mark up the documents with elements holding the leaf node values and other elements holding the parent values. This bakes your taxonomy into the documents. Danny explained this one nicely.
Option 2: Mark up the documents with just the (assumed unique) leaf node values. Maintain a separate declarative document with the hierarchy showing how the leaf node values fit together. Perhaps that's more useful. You'll do your query and quickly fetch all the leaf node values, and when you want to show facets above the leaf nodes just do some coalescing math. The performance should be good. As an example, if you're modeling a biological taxonomy, you can quickly find the distinct number and count of animals matching any query, and then if you want to show mammals vs reptiles you walk the list of distinct animal matches and use your declarative document to figure out how many you have of each. Use the MarkLogic "map" API and I expect this will be very fast even for thousands of distinct animals which is probably more than you have in your case. If you want to limit a query to a certain parent node (i.e. reptiles), you'd use an or-query for the leaf nodes. That's how the thesaurus works in essence. You don't want many thousands of expanded values though. So... Option 3: Put the taxonomy hierarchy into a single string. Perhaps you'd have "reptile/snake/cobra" or something. This is similar to the option above but bakes the hierarchy into the documents again which is mentally simpler perhaps and has some query performance perks. For any given query you can get the distinct list of matching strings and you can easily do the math (again probably using map) for how many results have values starting with reptile vs starting with mammal. You can also then really easily limit your query to "reptile" by using a word-query or range-query against this field. If terms repeat in different places you can use an initial anchor word and a phrase search to make sure you're left-anchored. If these approaches don't sound suitable, maybe you can give more details about your use case, the taxonomy, the performance needs, and the size of your corpus. If one sounds suitable and you get stuck making it happen, let me know. And maybe someone else has a good Option 4. :) -jh- On May 13, 2010, at 5:14 PM, Ramon Felciano wrote: > Hi – > > I just attended my first MarkLogic user conference and liked the demos I saw, > especially those that demonstrated the ability to build a faceted search > application fully within MarkLogic. I’m looking to build a similar > search-and-browse application for a document collection that is organized > using tags from a very large hierarchical controlled vocabulary, and would > like to use these tags as the basis for the faceted navigation. I was > planning to use Lucene/Solr, but now am wondering whether I could do this > largely within MarkLogic, but am getting stuck on how to auto-generate the > facets within the UI. > > Are there any examples showing how to dynamically construct *hierarchical* > faceted UIs all within ML (e.g. using XQuery)? > > Thanks, > > Ramon > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
