Jakob,

   I can't give you a definitive answer because mlcp was created after my time 
at MarkLogic, but I did create XCC and I think I recognize what's going on here.

   The exception it's throwing 
(com.marklogic.xcc.exceptions.StreamingResultException: IOException 
instantiating ResultItem 274350: Premature End-Of-Stream on read.  Server 
connection lost?) occurs when the MarkLogic server unexpectedly closes the 
network connection that XCC is reading from.  There are basically two reasons 
this can happen: 1) MarkLogic crashed (not very common anymore) or 2) the XCC 
client (mlcp in this case) took too long to read the data that ML was trying to 
send.

   XCC operates in two modes: buffered and streaming.  In buffered mode (the 
default), XCC reads the entire data stream from ML before returning the 
structured response to you.  This is fine for most uses, such as fetching a 
page-full of results, and it prevents your code from ever seeing this 
connection-dropped scenario because there is nothing left to read when your 
code resumes.

   But some results, such as a bulk export like you're doing, cannot be 
buffered in memory on the client.  For that, XCC runs in a streaming mode where 
the XCC library reads from ML on behalf of the client as it fetches each item 
in the result sequence.  In this mode, an active connection to ML is still open 
and must be serviced.

   Generally this works fine, but if the XCC client gets delayed doing 
something time consuming between documents, leaving the connection idle for too 
long, then MarkLogic will unceremoniously hang up the connection.  Because 
there is a block-structured format to the streaming data and the position where 
it stopped could be anywhere, it is not practical for the ML server to signal 
to the XCC client that it has run out of patience and given up.  That is why 
the exception is telling you that the stream stopped unexpectedly and is 
hinting that the server connection was lost.

   Now, I'm not sure why there might be a long delay on the client side, but 
since you're basically exporting everything and have specified that the output 
format is an archive, it is conceivable that the client side is using a lot of 
memory and/or I/O to format the data into that archive.  If this is eventually 
causing contention for resources leading to significant swapping, it's possible 
that XCC didn't get back to service the connection in a timely fashion.

   I should probably also note that XCC pre-buffers from the stream, so there 
could be dozens of calls into XCC for the next item before its internal buffer 
underflows and it fetches more from the connection.  If each item pulled from 
the result set takes significant time to process/archive/transform/whatever, 
then that gap between buffer refreshes can widen significantly.

   As for configuration, changing the "keep-alive" time on the XDBC appserver 
will not have any effect.  That is the time *while the connection is idle* to 
wait for another new request before closing the idle connection.  This timeout 
is normal and is handled automatically by XCC in its connection pool.

   Nor will changing "max time limit".  This is the amount of time that a 
request is allowed to run before it is killed.  If this time had been exceeded, 
XCC would have received a structured EX-TIME exception and the stack trace 
would have indicated that.

   When the response data is streaming across the wire, the server-side request 
has finished and you're receiving the serialized result.  The config parameter 
that matters for this is "request timeout", which is usually 30 seconds.  What 
this means is that if the connection on which MarkLogic is writing a result 
does not move for this many seconds, then the connection is dropped and the 
pending data is thrown away.

   So, you might try setting this value higher, say 120 or so, to see if gets 
any farther.  If it is resource contention/swapping, you probably need to 
address that as the root problem.  As Aries suggested, if MarkLogic and the 
mlcp client are on the same machine they may be fighting with each other for 
resources.  MarkLogic generally assume that it has the machine to itself, so 
doing a full database export can cause it to allocate lots of memory space, 
possibly starving out the mlcp process.

   I hope that's helpful.

---
Ron Hitchens {[email protected]}  +44 7879 358212

On Jul 23, 2014, at 9:56 PM, Jakob Fix <[email protected]> wrote:

> Hi Aries,
> 
> as suggested I increased the timeout to 30.  the process still doesn't 
> finish, I get until 19% (see 
> https://gist.github.com/jfix/6bb7fe80cab7e3ef78e6), but before and after 
> there are similar and different errors.  in addition to the "server 
> connection lost?" suspicion, MarkLogic now offers a new error on line 124 
> (and later) "java.lang.RuntimeException: Could not buffer value as string"
> 
> of course, I could probably further increase the keepalive timeout, but that 
> seems like a band aid. so, unfortunately, I'm not much further ahead and it's 
> slightly frustrating because on paper mlcp is supposed to replace xqsync, but 
> it's hard to get it to do basic stuff (because I suppose exporting a database 
> is quite a standard task).
> 
> I welcome any help to get this done.
> 
> cheers,
> Jakob.
> 
> 
> On Wed, Jul 23, 2014 at 1:18 AM, Aries Li <[email protected]> wrote:
> Maybe the server is too busy to response in time. I would try raise the 
> keepalive timeout from 5 to 30 on the xdbc server. Please let us know if that 
> helps.
> 
>  
> 
> Aries
> 
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Jakob Fix
> Sent: Tuesday, July 22, 2014 4:02 PM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] mlcp export problem/question
> 
>  
> 
> Hi,
> 
> I'm trying to export a database using mlcp. I'm not having much success.
> 
> Here is a gist of the error output after 1009 seconds running (that's what 
> mlcp says):
> 
> https://gist.github.com/jfix/2ef60350f8af9a4c2f33
> 
> mlcp wonders whether "Server connection lost?" but it's running as far as I 
> can make out. Are there any special precautions to take when exporting (i.e. 
> disabling potentially running tasks on the Taskserver, ...)?
> 
> 
> the database is about 3 GB big (according to the Size indication in the 
> status page: 3,045 MB).
> 
> I'm running this on a Macbook Pro with 8Gb of RAM, latest MarkLogic (7.0-3) 
> and latest mlcp (Hadoop2-1.2-2).
> 
> cheers,
> Jakob.
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to