David,

You might be thinking of collections as an object with a list of members, and 
so having a small number of collections might be faster than data on each 
document. That's not how they work. Instead, they are more like extra metadata 
on each document. I like Blakeley's analogy of post it notes. Collections are 
defined by the members directly - there is no object that lists all members of 
a collection.

If you already have the information marked up in the document, I don't think 
there's much benefit to using collections. 

Kelly

Message: 4
Date: Tue, 19 Jul 2011 21:34:39 +0000
From: "Lee, David" <[email protected]>
Subject: [MarkLogic Dev General]        
To: "General Mark Logic Developer Discussion
        ([email protected])" <[email protected]>
Message-ID:
        <31395bf86e0a454f832b8f8824ed6bda03a...@exmb-pp03.corp.epocrates.com>
Content-Type: text/plain; charset="us-ascii"

Thanks to some tips from this group (and especially Kelly !) I've started 
leveraging collections instead of directories.  So far really fantastic results 
!!!
Thank you all !!

Of course one success opens the doors to a million questions ...

Question ... Is there a significant cost to having a 'large' number of 
overlapping  documents in collections ?
In my use case I may have millions of very similar small documents all with 
some basic set of attributes which have a small set of possible values.   I've 
implemented attribute value range indexes, but was wondering if collections 
might work better ?
A typical use case would be to filter a result set by only those documents with 
a particular attribute set to one value.
If I had collections for each attribute/value combination  (maybe 100 
collections max) A collection query could do the equivalent of a range index.
Example:

<logfile host="host1" system="tomcat" ...>
   ...

Instead of making a range index on logfile/@host and logfile/@system
Make collections called    host-host1  host-host2  host-host3  ... and 
system-tomcat system-mysql ...
Then this xpath
//logfile[@host eq 'host1']

would be equivalent to a collection search on 'host-host1'

Is this brilliant or stupid ?  Obviously there will be a tradeoff ... but I'm 
thinking in this case since the number of possible values is very small that 
collections might actually be a good thing.

-David





----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]<mailto:[email protected]>
812-482-5224
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to