Hi Mike,

Thanks for the information and excellent suggestions.  I think the sampling
approach will work in my case.

Gary 


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael
Blakeley
Sent: Thursday, March 28, 2013 12:38 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] data size of collection

The information you're looking for isn't part of the on-disk structures, so
decoding the fragment is the only built-in way to measure it. Also watch out
for anomalies between on-disk and string-length. On-disk size will include
index size and deleted fragments, plus the compressed tree storage will be
smaller than the string-length.

But if string-length is accurate enough, then probably sampling is also
accurate enough. So I would just sample N documents in the collection,
average, then multiply by xdmp:estimate for the collection. That ought to
give you a pretty good idea with reasonable speed. If using the default
document order leads to problems, this might be a good use for the 'random'
option for cts:search.

If sampling doesn't work out, you could add an element or a property that
tracks string-length. That could happen as part of existing ingestion
processing, or using CPF. However you generate the data, create an
appropriate range-index and then use it with cts:sum or cts:sum-aggregate.

-- Mike

On 28 Mar 2013, at 08:50 , "Gary Larsen" <[email protected]> wrote:

> Hi,
>  
> Is there a quick method to estimate the size in bytes of all documents in
a collection?   Would like to determine where the size of the database is
increasing.  I realize that document size may not mirror the database size
but good enough for what I need.
>  
> Reading each document's string-length is painfully slow on large
collections.  Thanks
>  
> Gary
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to