[arangodb-google] Re: Maximum document size?

Jan Thu, 16 Feb 2017 00:34:46 -0800

Hi,

the JSON data sent to ArangoDB is internally stored in the VelocyPack 
format. So yes, VelocyPack does affect the size of the documents. 
Normally VelocyPack objects can be stored as compact as JSON or even more 
compact, as shown in the table 
[here](https://github.com/arangodb/velocypack/blob/master/Performance.md).
The theoretical maximum size of a document in VelocyPack is a few exabytes.


In ArangoDB, there are additional practical limits for a document's size:
- every document is stored in the WAL first, so it must fit into a WAL 
journal file. The default size is 32 MB (can be adjusted using 
--wal.logfile-size). When a bigger document arrives that does not fit into 
a journal, a new journal will be created that can hold this document. 
Journals can grow beyond the --wal.logfile-size threshold if the option 
--wal.allow-oversize-entries is set to true (it is by default).
- if documents are processed with JavaScript, there are additional limits. 
The maximum string length there is 256 MB if I am not wrong. The 
ArangoShell (arangosh) and some other ArangoDB functionality uses 
JavaScript and these parts may want to JSON-stringify documents. This 
practically caps the max document size to 256 MB if these parts are used. 
- the ArangoDB client tools (e.g. ArangoShell) all use a configurable 
netpack packet max size, which is 128 MB by default (--server.max-pack-size 
option). This may need to be increased to allow for bigger documents in 
network traffic.
 
So yes, it will work with documents bigger 32 MB. Here's an example with a 
document that's approx. 177 MB big (note that I already adjusted the max 
packet size value for this):

> var doc = {}; for (i = 0; i < 5000000; ++i) doc["testdata" + i] = 
"testdata" + i;
testdata4999999

> JSON.stringify(doc).length
177777781

> db._create("biggie");
[ArangoCollection 27417865702610887, "biggie" (type document, status 
loaded)]

> db.biggiedb.biggie.insert(doc);
{ 
  "_id" : "biggie/27417865702610899", 
  "_key" : "27417865702610899", 
  "_rev" : "_UiV8Jzq---" 
}

Processing time for a documents normally depends on the document size, so 
bigger documents will in most cases take longer to process than smaller 
ones.
So in practice big documents should be used only in exceptional cases, when 
it is unavoidable.

Best regards
J


Am Donnerstag, 16. Februar 2017 05:17:47 UTC+1 schrieb William Hayes:
>
> Hopefully a quick question.  It looks like the max document size (e.g. 
> document value) is determined by the data journal size which I think is 
> 32MB but which is configurable.  Do I understand this correctly?  I will 
> have a lot of small files with occasional files that are very large 
> potentially up to a few hundred MB.  Does the Velocy lib affect the size of 
> the JSON document sizes in the collections?  
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Maximum document size?

Reply via email to