Pinku,

   There are really two aspects to your question.  The first, as Geert
mentioned, is that a JSON (or XML) file is not stored as-is on the
filesystem by MarkLogic.  JSON and XML are *external representations* of a
data model.  The text file is basically a description of how to re-create
that data model when parsed.  When you send a JSON or XML file into
MarkLogic to be stored, it parses the text and builds an internal data
structure that it then persists and works with, the text representation is
not kept.  That internal model replaces the curlies and square brackets <or
angle brakets> with more space/time efficient representations, and any text
data content is compressed.  This means there is not a 1-1 correlation
between the JSON/XML that went in and the space used inside MarkLogic.  It
also explains why the JSON or XML that comes out often doesn’t match
character-for-character what went in.

   The second aspect is what happens when you update a document.  MarkLogic
is an MVCC database, which basically means it’s append-only.  Documents
once committed are never changed.  An update made to a document means that
a new copy of the document is created with the modification(s), the old one
is marked as expired and the new copy is marked as live.  Usually, when a
merge of the database happens sometime later, expired documents are
expunged.  But this can be prevented if it’s desirable to keep the earlier
states of documents.  It’s possible to time-travel to look at previous
states of the database using explicit timestamps.

   Document updates cause re-fragmentation of a document.  This basically
means that the new version of the document is inserted again.  But this
happens on a fragment level.  Most of the time one document equals one
fragment, and that is usually the right choice.  But, for example, if your
document is a book and it is fragmented so that each chapter is a fragment,
then making a change to Chapter 3 will induce re-fragmentation but only the
Chapter 3 fragment will be re-inserted, the others will be left unchanged.

   For everything you’d ever what to know about how this stuff works,
download and read Jason Hunter’s excellent whitepaper, *Inside MarkLogic
Server*, available here: http://developer.marklogic.com/inside-marklogic

  Cheers.

----
Ron Hitchens r...@overstory.co.uk, +44 7879 358212


On June 20, 2017 at 4:00:14 PM, Geert Josten (geert.jos...@marklogic.com)
wrote:

Hi,

MarkLogic will save complete copies of documents, but whether a JSON file
of 500Kb on disk will really take a footprint of 500Kb of forest data is
rather hard to predict. Values and property names are mapped to a string
data table that is stored separately from the structure. If there is a lot
of repetition in the data, it could be much less then 500Kb per copy. It is
best to just try..

By the way, a 500Kb JSON sounds large. It might be worth looking into
splitting it into pieces. MarkLogic works best with record-like documents.
E.g. instead of saving an entire bookstore in one document, save books
separately.

Kind regards,
Geert

From: <general-boun...@developer.marklogic.com> on behalf of Pinku Surana <
pinku.sur...@symbiont.io>
Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Date: Tuesday, June 20, 2017 at 4:53 PM
To: "general@developer.marklogic.com" <general@developer.marklogic.com>
Subject: [MarkLogic Dev General] Question about bitemporal DB features


I'm considering MarkLogic and have a question about the implementation of
the bitemporal DB feature.

Say I have a 500KB JSON document stored in the DB. I want to update a
single field in the document 2000 times. Will MarkLogic store a duplicate
of the entire object (resulting in 1GB of total storage for that object)?
Or will it only store the difference between the object, hopefully
resulting in significantly less space consumption?

I want to use this feature to look at the object in the past. I'm hoping
MarkLogic can store changes efficiently while also reconstructing old
versions of the object quickly.

Thanks.

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to