Re: [MarkLogic Dev General] Maximum size for a map?

Damon Feldman Mon, 06 May 2013 19:01:55 -0700

Will,

You may be able to use range indexes either by using cts:element-values with an 
element-value-query to "key" the lookup and have the value in the index, or by 
range-indexing a value that has the key and value separated by a token. This 
may not be quite as fast a map lookup but can simplify your code.


If you describe the nature of the lookup we can brainstorm other ideas.

Yours,
Damon

--
Damon Feldman
Sr. Principal Consultant, MarkLogic


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Monday, May 06, 2013 9:01 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Maximum size for a map?

The map won't need to be updated frequently, so the idea is to serialize it to 
the database and filesystem for portability. Then on first use, it gets loaded 
into a server field.  My tests are showing you're pretty spot on for the 
deserializing time. But after that it's loaded in the field and always 
available. My worry is about that initial doc() call on boxes that may have a 
smaller expanded-tree cache. In this case, is my only option to ensure each box 
has sufficient values to hold the 400MB deserialized map or face 
XDMP-EXPNTREECACHEFULL? I could try/catch, and throw a friendlier error for the 
small systems.

I chose map for speed, but I if that's too much trouble then I suppose the 
kay/value pairs could also be stored in a non-map document with a range index 
on the keys and fragment root set to its children. Then there would be no need 
for doc(), although I'm not sure how much speed that would give up.

-Will
 

On 5/6/13 4:48 PM, "Michael Blakeley" <[email protected]> wrote:

>Yes, any doc() call will use space in the expanded-tree cache. So you 
>might end up with X in the cache, plus Y for the deserialized map.
>
>I would also worry about how long it might take to deserialize a 400-MB 
>map, even if the XML is already in cache. My guess is around 30-sec to 
>construct the map. If the cache is cold that might double because the 
>fragment has to be read from disk and decoded. But those are just guesses.
>
>There are a couple of approaches that might avoid that cost. One is to 
>break up the map into multiple small documents. You could query a 
>special directory or collection for document that have the key(s) you 
>need, and let the expanded-tree cache handle the memory management. 
>Each map would be relatively small, so deserialization wouldn't be as 
>expensive.
>
>Another approach is to keep the map in a server field. That would be 
>both powerful and dangerous, because the memory for a server field is 
>persistent. We are used to working with query allocations, which 
>disappear when the query ends. So a single query is limited in its 
>scope for damage. But a 400-MB server field allocates 400-MB per eval 
>host, for the lifetime of the host process.
>
>So you'd want to be very careful to ensure that each host has exactly 
>one of these huge server fields. You'd also have to be very careful 
>about updating the map, partly because of the size and also because 
>server fields do not offer much in the way of memory protection. 
>Depending on your needs you might be able to do some sort of A-B 
>switching when you need to update the map, or develop a locking strategy, or 
>both.
>
>-- Mike
>
>On 6 May 2013, at 16:29 , Will Thompson <[email protected]>
>wrote:
>
>> Mike - I should have been a little more specific about the use case.
>>What
>> if that map is serialized to the db; would calling doc() on that  
>>potentially overload the expanded tree cache?
>> 
>> let $m := map:map(doc('/path/to/map.xml')/map:map)
>> return xdmp:set-server-field('my-map', $m)
>> 
>> Best guess on the QA server is that ML was installed when its VM was 
>> allocated fewer resources. But that's a good point about catching bad 
>> queries.
>> 
>> -Will
>> 
>> 
>> On 5/6/13 4:05 PM, "Michael Blakeley" <[email protected]> wrote:
>> 
>>> No, maps don't use expanded tree cache space. A really large map 
>>>might  hit some per-eval limits, but I didn't find them when I 
>>>created map  around 800-MiB on my laptop, with 6.0-3. I used an 
>>>xdmp:quote to try to  make sure the map would really allocated more space 
>>>for each entry.
>>>This
>>> was fine at 80-MiB and took about 5-sec. For 800-MiB it took a 
>>>little  longer, and the OS swapped some pages out. So I conclude that 
>>>it was  working hard to allocate all the memory.
>>> 
>>> let $m := map:map()
>>> let $n := doc()[1]
>>> let $_ := (1 to 1000000) ! (
>>> map:put($m, xdmp:integer-to-hex(xdmp:random()), xdmp:quote($n))) 
>>> return map:count($m) * string-length(xdmp:quote($n)) div (1024 * 
>>> 1024) , xdmp:elapsed-time() =>
>>> 802.04010009765625
>>> PT1M6.429219S
>>> 
>>> On that QA system, you might have set the expanded tree cache size 
>>> to a smaller value on purpose. That can be a good way to catch 
>>> poorly-optimized queries.
>>> 
>>> -- Mike
>>> 
>>> On 6 May 2013, at 14:44 , Will Thompson <[email protected]>
>>> wrote:
>>> 
>>>> Here's another one related to the Expanded Tree Cache: Say I want 
>>>>to  load  a giant map: 400MB or more. Will this always be dependent 
>>>>on the size of  the Expanded Tree Cache? Most of our dev machines 
>>>>have an Expanded Tree  Cache big enough to handle a map like this, 
>>>>but some don't, and for some  reason our QA server is set to an 
>>>>inexplicably small value. Is it  advisable to just manually increase 
>>>>that value so everything fits? Are  there any other general rules 
>>>>when adjusting server spec values? I have  mostly heard "look don't 
>>>>touch" with regard to these settings.
>>>> 
>>>> -Will
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Maximum size for a map?

Reply via email to