Interesting approach. I have a few questions. First of all, do you even need either (range indexes or collections)? The query example you gave should be resolvable from the Universal Index alone. I tried this in CQ (after creating 300 sample <logfile> documents):
xdmp:query-trace(true()), //logfile[@host eq 'host1'] And then I looked in the logfile: Analyzing path: fn:collection()/descendant::logfile[@host eq "host1"] Step 1 is searchable: fn:collection() Step 2 is searchable: descendant::logfile[@host eq "host1"] Path is fully searchable. Gathering constraints. Comparison contributed hash value constraint: logfile/@host = "host1" Step 2 predicate 1 contributed 1 constraint: @host eq "host1" Comparison contributed hash value constraint: logfile/@host = "host1" Step 2 predicate 1 contributed 1 constraint: @host eq "host1" Step 2 contributed 2 constraints: descendant::logfile[@host eq "host1"] Executing search. Selected 100 fragments to filter The above told me that the result was completely resolved from the Universal Index since I haven't enabled any ranged indexes (and I know that exactly 100 of my sample docs have host="host1"). My other two questions: * What is your main motivation for using collections rather than attribute range indexes? * How do you plan to associate the documents with the collection URIs? Thanks, Evan Lenz Software Developer, Community developer.marklogic.com<http://developer.marklogic.com> From: "Lee, David" <[email protected]<mailto:[email protected]>> Reply-To: General MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Tue, 19 Jul 2011 14:34:39 -0700 To: "General Mark Logic Developer Discussion ([email protected]<mailto:[email protected]>)" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] Lots of collections ... Thanks to some tips from this group (and especially Kelly !) I've started leveraging collections instead of directories. So far really fantastic results !!! Thank you all !! Of course one success opens the doors to a million questions ... Question ... Is there a significant cost to having a 'large' number of overlapping documents in collections ? In my use case I may have millions of very similar small documents all with some basic set of attributes which have a small set of possible values. I've implemented attribute value range indexes, but was wondering if collections might work better ? A typical use case would be to filter a result set by only those documents with a particular attribute set to one value. If I had collections for each attribute/value combination (maybe 100 collections max) A collection query could do the equivalent of a range index. Example: <logfile host="host1" system="tomcat" ...> ... Instead of making a range index on logfile/@host and logfile/@system Make collections called host-host1 host-host2 host-host3 ... and system-tomcat system-mysql ... Then this xpath //logfile[@host eq 'host1'] would be equivalent to a collection search on 'host-host1' Is this brilliant or stupid ? Obviously there will be a tradeoff ... but I'm thinking in this case since the number of possible values is very small that collections might actually be a good thing. -David ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. [email protected]<mailto:[email protected]> 812-482-5224
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
