For what purpose are you intending to use this size ?
The size of a document (presuming you mean an XML document) in the database
has multiple meanings - none of them mean the size of a particular text 
serialized
representation of the document.  There is no direct way to get the 'byte size'
of a document in the database because it don't have  a single precise size.
Documents are stored highly 'compressed' - not the gzip kind of compress,
but an internal representation of the node tree in binary and in addition a 
document
participates in indexing, collections, permissions, URI's,  may have properties 
etc.

Typically people asking this question want to know how big the document either
*was* before it was inserted, or *will be* if extracted.
The first value is not stored with the document and the second will depend on 
your serialization
options used when extracting and serializing the document - that is if the 
document *ever*
was in a text serialized form..  many documents neither originate nor are 
serialized into text form.

Depending on what purpose you need this information - and what degree of 
tolerance is required
there are methods to track text serialized size along with the document,
but unless you are using a binary document that information is meta-data - it 
would be data
about the *text* format(s) not the internal document size - and may not (or 
very likely will not)
actually match the size in byte when you extract and reconstitute a text form.

similar in many ways as asking "how many bytes is in an 8x11 page of paper"
or "how many bytes is a MP3 file that stores 10 minutes of audio"

What is manful is to extract the document, convert it to text using a specific
encoding and serialization options then adding the bytes.
like
   fn:string-length( xdmp:quote(fn:doc("file.xml" )))

This will give you the size in bytes using the default encoding and 
serialization.
That may not be the same as if you extract the document using another tool.
Its also an expensive operation to do in bulk or on large documents because
it forces them to be loaded into memory and converted to text.

Similar you can store the original size of the document as a property and query 
that.
Sometimes that is useful - as an indication of the original size, but not to be 
confused
with what size(s) the document will have if you turn it back into text.




-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell:  +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>

From: [email protected] 
[mailto:[email protected]] On Behalf Of Debanka
Sent: Sunday, October 26, 2014 9:48 PM
To: [email protected]
Subject: [MarkLogic Dev General] Retrieving size of a marklogic document

Hi Team,
  How can i get to know the size of a document stored in marklogic .. Size 
means in bytes. Your any help on this appreciated.
Thanks ,
Debanka
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to