Re: [MarkLogic Dev General] mlcp export problem/question

Damon Feldman Fri, 25 Jul 2014 03:01:19 -0700

I'd be interested to know what MarkLogic is doing around the time of the lost 
connection. The app server status page will show active queries and how long 
they have been running (click "show more" and mouse-hover over the query 
description).


I believe mlcp requests will show the request size (meaning a query for items 
20,000 to 30,000 is a 10k request size) as part of the query description. Large 
queries could work fine for small documents (XML) but then tend to all back up 
on a set of large binaries at some point, causing a somewhat predictable slow 
period that causes the connection to be starved and dropped.

Damon

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of David Lee
Sent: Thursday, July 24, 2014 1:48 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] mlcp export problem/question

My .02 euro for what it costs.
1) Out of memory can occur I java due to other processes pushing out all 
available VM,  but its rarely due to that Because you set (or default) a 
maximum memory size (-Xmx#####m )  which (os and JVM dependant) tend to Make 
sure that much is available to them.  But its still possible (unlikely).

2) Thread starvation.  The threads in mlsync should be IO bound so thread 
starvation isn't likely an issue, As every IO does an implicit yield.  In 
streaming mode that is unlikely.

3) The analysis of marklogic taking too long to respond is most likely.  What 
OS is this ? 
OS's have peculiarities of ignoring TCP idle times, and so do various routers 
and gateways ... so if there is an idle time ... typically 3 minutes is what 
I've seen the connection will drop somewhere.  ON windows I have validated 
there's nothing that can be done about it (you can research the past 20 years 
of windows tcp/ip implementations) .. but if there is any other routers between 
the endpoints they too will do this.   Why would Idle happen ? It can happen if 
you overload the server such that it takes too long to respond to a request, 
particularly if 

A) you send a very large document (or batch) that takes a long time to insert 
(perhaps concurrent load or lock) such that the network backs up all the way to 
the client (through all the buffers on both ends and in the middle) and the 
client is hence blocked on write before the servers sends the first result.  
From the TCP point of view this is "idle" ... and after a while the connection 
will drop.

B) Simply do a long-running request (sleep of 10 minutes).

I would guess #A first ... (without having debugged your situation).
To validate this , contrary to other suggestions I would cut *down* on the 
number of concurrent threads and batch size, As well as monitor MarkLogic to 
see how its load is doing.  If its overloaded (you can tell by various means, 
but the best kind of metric is the app request average time or queue length_ 
... then you're sending things either faster and/or too much at once ...

You can go from there, start with a small thread count and small batch size and 
valiate it works, then work your way up until it fails.  If you monitor the ML 
server and see a coorelated non-linear load (i.e. the request time or request 
queue length starts to go high very quickly ... you've reached a critical point.

This is not common though, as *usually* the system is self-regulating ... send 
stuff too fast and it simply slows you down, but there are situations where 
combinations of network "issues" and large buffering on either side and/or 
anything keeping the server from responding quickly (lock contention, too many 
parallel requests greater then the app server thread count etc) that can lead 
to errors of this sort.


-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell:  +1 812-630-7622
www.marklogic.com

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Thursday, July 24, 2014 8:35 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] mlcp export problem/question

In my experience XQSync and RecordLoader often have problems due to GC pauses. 
Starting the JVM with `-Xincgc` helps. I don't think mlcp does that 
automatically, but from looking at mlcp.sh I think it will pick up the JVM_OPTS 
environment variable. So `export JVM_OPTS="-Xincgc"` should work.

Another potential problem with the JVM in general is thread starvation: 
according to what I've read the JVM thread scheduling isn't very fair. So the 
RecordLoader, CoRB, and XQSync codebases all have Thread.yield() calls 
sprinkled around in the hope of giving other threads a chance. Often the code 
will yield just before starting a server interaction, hoping to run 
uninterrupted while it talks to the server. Right now mlcp doesn't seem to have 
any yield calls, but I don't have any strong evidence that they would make a 
difference.

-- Mike

On 24 Jul 2014, at 04:08 , Ron Hitchens <[email protected]> wrote:

> 
> Geert.
> 
>    Java would not throw an out of emory exception if it was being crowded out 
> by MarkLogic (or vice versa).  It would start swapping if the allocated Java 
> heap could not fit into physical memory.  With many java threads running, 
> lots of allocations and releases would be happening with lots of random heap 
> accesses.  Even with an SSD, the overhead of the OS managing memory pages 
> becomes very high and can ultimately result in thrashing.  It's also possible 
> that with so much object churn in the heap, that there may be some occasional 
> stop-the-world full-sweep garbage collections happening (which would just 
> make the thrashing worse).
> 
>    I noticed in the exception message that it happened on the 274,350th item 
> in the result sequence (one of ten concurrent result sequences).  So I 
> imagine there was quite a bit of Java memory heap activity going on.
> 
>    MLCP is using XCC in streaming mode.  That exception 
> (com.marklogic.xcc.exceptions.StreamingResultException) is only possible when 
> in streaming mode.  And it wouldn't be possible to do an export of large 
> amounts of data otherwise.  Avoid streaming mode if you don't need it 
> because, as I explained below, you need to do some extra work to manage it 
> correctly.  You also need to release the connection back to XCC when you're 
> done for it to be re-used effectively.
> 
>    Cheers.
> 
> ---
> Ron Hitchens {[email protected]}  +44 7879 358212
> 
> On Jul 24, 2014, at 10:21 AM, Geert Josten <[email protected]> wrote:
> 
>> Jakob,
>>  
>> Aries suggested to use thread count 1. The default is 10. You might be able 
>> to use something higher than 1.
>>  
>> I don't think that mlcp and ML are competing much against each other. Java 
>> would through an out of memory, if ML would have taken all. It could be 
>> though that the parallel threads cause competition within ML itself for 
>> resources. Things could also get pushed into swap memory, which is usually 
>> slower, but since you are on a macbook pro, I am guessing you are running 
>> with ssd, so that is less likely too. There is plenty IO going on, but the 
>> export is just writing a few (big) files, so that is not likely a big 
>> competition either.
>>  
>> I'm intrigued by below comments from Ron though. I recall there is a 
>> streaming option in MLCP, and reading about the streaming mode in XCC, this 
>> might be usefull to try..
>>  
>> Tip: run mlcp.sh with just 'export' as argument to get a list of all export 
>> options. And DMC provides docs on MLCP as well..
>>  
>> Cheers,
>> Geert
>>  
>> Van: [email protected] 
>> [mailto:[email protected]] Namens Ron Hitchens
>> Verzonden: donderdag 24 juli 2014 00:14
>> Aan: MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] mlcp export problem/question
>>  
>>  
>> Jakob,
>>  
>>    I can't give you a definitive answer because mlcp was created after my 
>> time at MarkLogic, but I did create XCC and I think I recognize what's going 
>> on here.
>>  
>>    The exception it's throwing 
>> (com.marklogic.xcc.exceptions.StreamingResultException: IOException 
>> instantiating ResultItem 274350: Premature End-Of-Stream on read.  Server 
>> connection lost?) occurs when the MarkLogic server unexpectedly closes the 
>> network connection that XCC is reading from.  There are basically two 
>> reasons this can happen: 1) MarkLogic crashed (not very common anymore) or 
>> 2) the XCC client (mlcp in this case) took too long to read the data that ML 
>> was trying to send.
>>  
>>    XCC operates in two modes: buffered and streaming.  In buffered mode (the 
>> default), XCC reads the entire data stream from ML before returning the 
>> structured response to you.  This is fine for most uses, such as fetching a 
>> page-full of results, and it prevents your code from ever seeing this 
>> connection-dropped scenario because there is nothing left to read when your 
>> code resumes.
>>  
>>    But some results, such as a bulk export like you're doing, cannot be 
>> buffered in memory on the client.  For that, XCC runs in a streaming mode 
>> where the XCC library reads from ML on behalf of the client as it fetches 
>> each item in the result sequence.  In this mode, an active connection to ML 
>> is still open and must be serviced.
>>  
>>    Generally this works fine, but if the XCC client gets delayed doing 
>> something time consuming between documents, leaving the connection idle for 
>> too long, then MarkLogic will unceremoniously hang up the connection.  
>> Because there is a block-structured format to the streaming data and the 
>> position where it stopped could be anywhere, it is not practical for the ML 
>> server to signal to the XCC client that it has run out of patience and given 
>> up.  That is why the exception is telling you that the stream stopped 
>> unexpectedly and is hinting that the server connection was lost.
>>  
>>    Now, I'm not sure why there might be a long delay on the client side, but 
>> since you're basically exporting everything and have specified that the 
>> output format is an archive, it is conceivable that the client side is using 
>> a lot of memory and/or I/O to format the data into that archive.  If this is 
>> eventually causing contention for resources leading to significant swapping, 
>> it's possible that XCC didn't get back to service the connection in a timely 
>> fashion.
>>  
>>    I should probably also note that XCC pre-buffers from the stream, so 
>> there could be dozens of calls into XCC for the next item before its 
>> internal buffer underflows and it fetches more from the connection.  If each 
>> item pulled from the result set takes significant time to 
>> process/archive/transform/whatever, then that gap between buffer refreshes 
>> can widen significantly.
>>  
>>    As for configuration, changing the "keep-alive" time on the XDBC 
>> appserver will not have any effect.  That is the time *while the connection 
>> is idle* to wait for another new request before closing the idle connection. 
>>  This timeout is normal and is handled automatically by XCC in its 
>> connection pool.
>>  
>>    Nor will changing "max time limit".  This is the amount of time that a 
>> request is allowed to run before it is killed.  If this time had been 
>> exceeded, XCC would have received a structured EX-TIME exception and the 
>> stack trace would have indicated that.
>>  
>>    When the response data is streaming across the wire, the server-side 
>> request has finished and you're receiving the serialized result.  The config 
>> parameter that matters for this is "request timeout", which is usually 30 
>> seconds.  What this means is that if the connection on which MarkLogic is 
>> writing a result does not move for this many seconds, then the connection is 
>> dropped and the pending data is thrown away.
>>  
>>    So, you might try setting this value higher, say 120 or so, to see if 
>> gets any farther.  If it is resource contention/swapping, you probably need 
>> to address that as the root problem.  As Aries suggested, if MarkLogic and 
>> the mlcp client are on the same machine they may be fighting with each other 
>> for resources.  MarkLogic generally assume that it has the machine to 
>> itself, so doing a full database export can cause it to allocate lots of 
>> memory space, possibly starving out the mlcp process.
>> 
>>    I hope that's helpful.
>> 
>> ---
>> Ron Hitchens {[email protected]}  +44 7879 358212
>>  
>> On Jul 23, 2014, at 9:56 PM, Jakob Fix <[email protected]> wrote:
>> 
>> 
>> Hi Aries,
>> 
>> as suggested I increased the timeout to 30.  the process still doesn't 
>> finish, I get until 19% 
>> (seehttps://gist.github.com/jfix/6bb7fe80cab7e3ef78e6), but before and after 
>> there are similar and different errors.  in addition to the "server 
>> connection lost?" suspicion, MarkLogic now offers a new error on line 124 
>> (and later) "java.lang.RuntimeException: Could not buffer value as string"
>> 
>> of course, I could probably further increase the keepalive timeout, but that 
>> seems like a band aid. so, unfortunately, I'm not much further ahead and 
>> it's slightly frustrating because on paper mlcp is supposed to replace 
>> xqsync, but it's hard to get it to do basic stuff (because I suppose 
>> exporting a database is quite a standard task).
>>  
>> I welcome any help to get this done.
>> 
>> cheers,
>> Jakob.
>> 
>> 
>> On Wed, Jul 23, 2014 at 1:18 AM, Aries Li <[email protected]> wrote:
>> Maybe the server is too busy to response in time. I would try raise the 
>> keepalive timeout from 5 to 30 on the xdbc server. Please let us know if 
>> that helps.
>>  
>> Aries
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of Jakob 
>> Fix
>> Sent: Tuesday, July 22, 2014 4:02 PM
>> To: General Mark Logic Developer Discussion
>> Subject: [MarkLogic Dev General] mlcp export problem/question
>>  
>> Hi,
>> 
>> I'm trying to export a database using mlcp. I'm not having much success.
>> 
>> Here is a gist of the error output after 1009 seconds running (that's what 
>> mlcp says):
>> 
>> https://gist.github.com/jfix/2ef60350f8af9a4c2f33
>> 
>> mlcp wonders whether "Server connection lost?" but it's running as far as I 
>> can make out. Are there any special precautions to take when exporting (i.e. 
>> disabling potentially running tasks on the Taskserver, ...)?
>> 
>> the database is about 3 GB big (according to the Size indication in the 
>> status page: 3,045 MB).
>> 
>> I'm running this on a Macbook Pro with 8Gb of RAM, latest MarkLogic (7.0-3) 
>> and latest mlcp (Hadoop2-1.2-2).
>> 
>> cheers,
>> Jakob.
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>>  
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>  
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] mlcp export problem/question

Reply via email to