Hi Mike, Thanks for the information and excellent suggestions. I think the sampling approach will work in my case.
Gary -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Thursday, March 28, 2013 12:38 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] data size of collection The information you're looking for isn't part of the on-disk structures, so decoding the fragment is the only built-in way to measure it. Also watch out for anomalies between on-disk and string-length. On-disk size will include index size and deleted fragments, plus the compressed tree storage will be smaller than the string-length. But if string-length is accurate enough, then probably sampling is also accurate enough. So I would just sample N documents in the collection, average, then multiply by xdmp:estimate for the collection. That ought to give you a pretty good idea with reasonable speed. If using the default document order leads to problems, this might be a good use for the 'random' option for cts:search. If sampling doesn't work out, you could add an element or a property that tracks string-length. That could happen as part of existing ingestion processing, or using CPF. However you generate the data, create an appropriate range-index and then use it with cts:sum or cts:sum-aggregate. -- Mike On 28 Mar 2013, at 08:50 , "Gary Larsen" <[email protected]> wrote: > Hi, > > Is there a quick method to estimate the size in bytes of all documents in a collection? Would like to determine where the size of the database is increasing. I realize that document size may not mirror the database size but good enough for what I need. > > Reading each document's string-length is painfully slow on large collections. Thanks > > Gary > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
