Pinku, There are really two aspects to your question. The first, as Geert mentioned, is that a JSON (or XML) file is not stored as-is on the filesystem by MarkLogic. JSON and XML are *external representations* of a data model. The text file is basically a description of how to re-create that data model when parsed. When you send a JSON or XML file into MarkLogic to be stored, it parses the text and builds an internal data structure that it then persists and works with, the text representation is not kept. That internal model replaces the curlies and square brackets <or angle brakets> with more space/time efficient representations, and any text data content is compressed. This means there is not a 1-1 correlation between the JSON/XML that went in and the space used inside MarkLogic. It also explains why the JSON or XML that comes out often doesn’t match character-for-character what went in.
The second aspect is what happens when you update a document. MarkLogic is an MVCC database, which basically means it’s append-only. Documents once committed are never changed. An update made to a document means that a new copy of the document is created with the modification(s), the old one is marked as expired and the new copy is marked as live. Usually, when a merge of the database happens sometime later, expired documents are expunged. But this can be prevented if it’s desirable to keep the earlier states of documents. It’s possible to time-travel to look at previous states of the database using explicit timestamps. Document updates cause re-fragmentation of a document. This basically means that the new version of the document is inserted again. But this happens on a fragment level. Most of the time one document equals one fragment, and that is usually the right choice. But, for example, if your document is a book and it is fragmented so that each chapter is a fragment, then making a change to Chapter 3 will induce re-fragmentation but only the Chapter 3 fragment will be re-inserted, the others will be left unchanged. For everything you’d ever what to know about how this stuff works, download and read Jason Hunter’s excellent whitepaper, *Inside MarkLogic Server*, available here: http://developer.marklogic.com/inside-marklogic Cheers. ---- Ron Hitchens r...@overstory.co.uk, +44 7879 358212 On June 20, 2017 at 4:00:14 PM, Geert Josten (geert.jos...@marklogic.com) wrote: Hi, MarkLogic will save complete copies of documents, but whether a JSON file of 500Kb on disk will really take a footprint of 500Kb of forest data is rather hard to predict. Values and property names are mapped to a string data table that is stored separately from the structure. If there is a lot of repetition in the data, it could be much less then 500Kb per copy. It is best to just try.. By the way, a 500Kb JSON sounds large. It might be worth looking into splitting it into pieces. MarkLogic works best with record-like documents. E.g. instead of saving an entire bookstore in one document, save books separately. Kind regards, Geert From: <general-boun...@developer.marklogic.com> on behalf of Pinku Surana < pinku.sur...@symbiont.io> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com> Date: Tuesday, June 20, 2017 at 4:53 PM To: "general@developer.marklogic.com" <general@developer.marklogic.com> Subject: [MarkLogic Dev General] Question about bitemporal DB features I'm considering MarkLogic and have a question about the implementation of the bitemporal DB feature. Say I have a 500KB JSON document stored in the DB. I want to update a single field in the document 2000 times. Will MarkLogic store a duplicate of the entire object (resulting in 1GB of total storage for that object)? Or will it only store the difference between the object, hopefully resulting in significantly less space consumption? I want to use this feature to look at the object in the past. I'm hoping MarkLogic can store changes efficiently while also reconstructing old versions of the object quickly. Thanks. _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general