Re: [MarkLogic Dev General] Maximum size for a map?

Will Thompson Mon, 06 May 2013 18:01:12 -0700

The map won't need to be updated frequently, so the idea is to serialize
it to the database and filesystem for portability. Then on first use, it
gets loaded into a server field.  My tests are showing you're pretty spot
on for the deserializing time. But after that it's loaded in the field and
always available. My worry is about that initial doc() call on boxes that
may have a smaller expanded-tree cache. In this case, is my only option to
ensure each box has sufficient values to hold the 400MB deserialized map
or face XDMP-EXPNTREECACHEFULL? I could try/catch, and throw a friendlier
error for the small systems.


I chose map for speed, but I if that's too much trouble then I suppose the
kay/value pairs could also be stored in a non-map document with a range
index on the keys and fragment root set to its children. Then there would
be no need for doc(), although I'm not sure how much speed that would give
up.

-Will
 

On 5/6/13 4:48 PM, "Michael Blakeley" <[email protected]> wrote:

>Yes, any doc() call will use space in the expanded-tree cache. So you
>might end up with X in the cache, plus Y for the deserialized map.
>
>I would also worry about how long it might take to deserialize a 400-MB
>map, even if the XML is already in cache. My guess is around 30-sec to
>construct the map. If the cache is cold that might double because the
>fragment has to be read from disk and decoded. But those are just guesses.
>
>There are a couple of approaches that might avoid that cost. One is to
>break up the map into multiple small documents. You could query a special
>directory or collection for document that have the key(s) you need, and
>let the expanded-tree cache handle the memory management. Each map would
>be relatively small, so deserialization wouldn't be as expensive.
>
>Another approach is to keep the map in a server field. That would be both
>powerful and dangerous, because the memory for a server field is
>persistent. We are used to working with query allocations, which
>disappear when the query ends. So a single query is limited in its scope
>for damage. But a 400-MB server field allocates 400-MB per eval host, for
>the lifetime of the host process.
>
>So you'd want to be very careful to ensure that each host has exactly one
>of these huge server fields. You'd also have to be very careful about
>updating the map, partly because of the size and also because server
>fields do not offer much in the way of memory protection. Depending on
>your needs you might be able to do some sort of A-B switching when you
>need to update the map, or develop a locking strategy, or both.
>
>-- Mike
>
>On 6 May 2013, at 16:29 , Will Thompson <[email protected]>
>wrote:
>
>> Mike - I should have been a little more specific about the use case.
>>What
>> if that map is serialized to the db; would calling doc() on that
>> potentially overload the expanded tree cache?
>> 
>> let $m := map:map(doc('/path/to/map.xml')/map:map)
>> return xdmp:set-server-field('my-map', $m)
>> 
>> Best guess on the QA server is that ML was installed when its VM was
>> allocated fewer resources. But that's a good point about catching bad
>> queries.
>> 
>> -Will
>> 
>> 
>> On 5/6/13 4:05 PM, "Michael Blakeley" <[email protected]> wrote:
>> 
>>> No, maps don't use expanded tree cache space. A really large map might
>>> hit some per-eval limits, but I didn't find them when I created map
>>> around 800-MiB on my laptop, with 6.0-3. I used an xdmp:quote to try to
>>> make sure the map would really allocated more space for each entry.
>>>This
>>> was fine at 80-MiB and took about 5-sec. For 800-MiB it took a little
>>> longer, and the OS swapped some pages out. So I conclude that it was
>>> working hard to allocate all the memory.
>>> 
>>> let $m := map:map()
>>> let $n := doc()[1]
>>> let $_ := (1 to 1000000) ! (
>>> map:put($m, xdmp:integer-to-hex(xdmp:random()), xdmp:quote($n)))
>>> return map:count($m) * string-length(xdmp:quote($n)) div (1024 * 1024)
>>> , xdmp:elapsed-time()
>>> =>
>>> 802.04010009765625
>>> PT1M6.429219S
>>> 
>>> On that QA system, you might have set the expanded tree cache size to a
>>> smaller value on purpose. That can be a good way to catch
>>> poorly-optimized queries.
>>> 
>>> -- Mike
>>> 
>>> On 6 May 2013, at 14:44 , Will Thompson <[email protected]>
>>> wrote:
>>> 
>>>> Here's another one related to the Expanded Tree Cache: Say I want to
>>>> load
>>>> a giant map: 400MB or more. Will this always be dependent on the size
>>>>of
>>>> the Expanded Tree Cache? Most of our dev machines have an Expanded
>>>>Tree
>>>> Cache big enough to handle a map like this, but some don't, and for
>>>>some
>>>> reason our QA server is set to an inexplicably small value. Is it
>>>> advisable to just manually increase that value so everything fits? Are
>>>> there any other general rules when adjusting server spec values? I
>>>>have
>>>> mostly heard "look don't touch" with regard to these settings.
>>>> 
>>>> -Will
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Maximum size for a map?

Reply via email to