Oh no I'm not loading into memory (ugg!)
I'm using File objects and creating an array of Content's from File's
(not InputStream or String ..).

Still amazingly slow ... but then I am running my ML server on a fairly
old box ... but its not THAT slow, whats weird is even with 10 threads
pushing blocks of 10 files thorugh the ML server is running about 10%
CPU and 5% IO Wait time ... this is over a 1Gb/sec LAN ... something
seems wrong.
I'm going to try the same code on the EC2 server to see if that makes a
difference (its where its going to run in production anyway - so if its
just my dev box I'll live with that).


-----Original Message-----
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron
Hitchens
Sent: Tuesday, March 16, 2010 5:09 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] RE:[MarkLogicDev
General]ServerConnectionException-consistantlyafterabout 20, 000 files


   How are you creating the Content objects for the
documents you're inserting?  You're not loading each
one into memory are you?  If you instantiate the
Content object with a File or RandomAccessFile handle,
XCC will read the file from disk and pass it through
without needing to load the whole document into memory.

On Mar 16, 2010, at 8:54 PM, Lee, David wrote:

> FYI I added XX:+UseConcMarkSweepGC  and was able to load the 30k files
without error ...
> still way too slow (not sure why yet but only about 2 docs/sec)... but
atleast it didn't fail.
> Will try again tonight to see if my success is reproducible but I have
good hopes this was the problem.
> 
> 
> 
> -----Original Message-----
> From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Lee, David
> Sent: Monday, March 15, 2010 1:31 PM
> To: Michael Blakeley; General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] RE: [MarkLogicDev
General]ServerConnectionException-consistantly afterabout 20, 000 files
> 
> Thanks Mike, great idea about the GC.
> I have JProfiler so I can do good  detailed measurements ... just
takes a while because the errors dont start showing up for 2+ hours ...
but good path to investigate !
> Thank you
> 
> -David
> 
> 
> -----Original Message-----
> From: Michael Blakeley [mailto:michael.blake...@marklogic.com]
> Sent: Monday, March 15, 2010 1:18 PM
> To: General Mark Logic Developer Discussion
> Cc: Lee, David
> Subject: Re: [MarkLogic Dev
General]ServerConnectionException-consistantly after about 20, 000 files
> 
> Lee, note that that's TIME_WAIT *not* TIMED_WAIT, and there's no need
to check the server, just the client. Any TIME_WAIT sockets will
disappear fairly quickly: the best way to check is during the test
itself.
> 
> Have you thought about garbage collection? The fact that the error
occurs after a given number of inserts is suggestive. If the GC thread
locks everything else off the CPU for a long enough period of time, the
server will time out connections. This is especially likely to happen if
the program working set is a large percentage of the java heap size
(which may in turn indicate either leaks that you could fix, or a need
for a larger heap).
> 
> You might try instrumenting your code to report insert times in Java,
and also report the elapsed time when you see exceptions. Then monitor
the java process size as your program runs. You may be able to correlate
longer elapsed times for inserts with GC events, which would tend to
confirm this hypothesis.
> 
> When using RecordLoader, XQSync, and Corb with large content sets, I
generally use the -XX:+UseConcMarkSweepGC VM option. Sometimes I also
raise the max heap size, but some care is required because too large of
a heap seems to slow things down.
> 
> If GC and memory turns out to be involved, I also recommend looking at
the whole Java program carefully, with an eye toward minimizing memory
utilization and especially toward removing any object leaks. If you are
leaking objects, then memory will fill up sooner or later no matter what
GC does. There are some good Java profiling tools available to help with
this.
> 
> -- Mike
> 
> On 2010-03-15 07:08, Lee, David wrote:
>> Thanks Ron, I'm doing all the things you suggest already
>> 1) Reusing a Session
>> 2) bundling 20 files in 1 insertContent()
>> 3) Checked netstat and there are no TIMED_WAIT connections on either 
>> client or server
>> 
>> I'm trying something different this time which is to use a thread
pool 
>> to try to increase effeciency of sending the files.  Maybe this will 
>> be worse on the system I dont know.
>> Maybe there is some kind of maximum session open time ?
>> The error occur about 2 hours into the transfer typically.
>> I could try closing and reopening the session every hour ...
>> 
>> -David
>> Server: 4.1-4 on Fedorah fc 11
>> Client: XP/Pro SP3 and Windows 7
>> XCC: Latest from download
>> 
>> 
>> -----Original Message-----
>> From: general-boun...@developer.marklogic.com
>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron 
>> Hitchens
>> Sent: Monday, March 15, 2010 4:49 AM
>> To: General Mark Logic Developer Discussion
>> Subject: Re: [MarkLogic Dev
>> General]ServerConnectionException-consistantly after about 20, 000 
>> files
>> 
>> 
>>    You may be filling up your OS's file table.  When a socket is 
>> closed, the OS holds onto it for a while (default usually about two 
>> minutes) to reliably detect any straggler packets.
>> 
>>    When you cycle a lot of connection quickly, this can max out 
>> internal data structures in the OS.  If you do a netstat and see 
>> zillions of connections in TIME_WAIT state, that's probably what's 
>> happening.
>> 
>>    If you're connecting across a LAN, this delay is not really 
>> needed, because it's hard for packet to get rerouted anywhere else.
>> You can tune the socket wait time down to 5 seconds or so and that 
>> will allow file table slots to be re-used more quickly.
>> 
>>    You can also insert multiple documents per request, all of which 
>> will be transferred together and result in fewer low-level sockets 
>> being opened.
>> 
>> On Mar 14, 2010, at 9:28 PM, Lee, David wrote:
>> 
>>> FYI, here's a stack trace from the same program but in this case its
>> the query component under load.
>>> This is very consistent as well after about 10 -20k requests
>>> 
>>> 
>>> com.marklogic.xcc.exceptions.ServerConnectionException: Error
parsing
>> HTTP headers: Premature EOF, partial header line read: ''
>>>  [Session: user=DLEE, cb={default} [ContentSource: user=DLEE,
>> cb={none} [provider: address=home/192.168.1.10:8011, pool=0/64]]]
>>>                 at
>>
com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(A
>> bs
>> tractRequestController.java:99)
>>>                 at
>>
com.marklogic.xcc.impl.SessionImpl.submitRequest(SessionImpl.java:280)
>>>                 at org.xmlsh.marklogic.put.setChecksum(put.java:341)
>>>                 at
org.xmlsh.marklogic.put.flushContent(put.java:315)
>>>                 at org.xmlsh.marklogic.put.putContent(put.java:288)
>>>                 at org.xmlsh.marklogic.put.load(put.java:272)
>>>                 at org.xmlsh.marklogic.put.load(put.java:266)
>>>                 at org.xmlsh.marklogic.put.run(put.java:126)
>>>                 at org.xmlsh.core.XCommand.run(XCommand.java:86)
>>>                 at org.xmlsh.core.XCommand.run(XCommand.java:63)
>>>                 at
>> org.xmlsh.sh.core.SimpleCommand.exec(SimpleCommand.java:121)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.core.Pipeline.exec(Pipeline.java:124)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.shell.Shell.runScript(Shell.java:362)
>>>                 at
>> org.xmlsh.core.ScriptCommand.run(ScriptCommand.java:75)
>>>                 at
>> org.xmlsh.sh.core.SimpleCommand.exec(SimpleCommand.java:121)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.core.Pipeline.exec(Pipeline.java:124)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.shell.Shell.runScript(Shell.java:362)
>>>                 at
>> org.xmlsh.core.ScriptCommand.run(ScriptCommand.java:75)
>>>                 at
>> org.xmlsh.sh.core.SimpleCommand.exec(SimpleCommand.java:121)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.core.Pipeline.exec(Pipeline.java:124)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
>> org.xmlsh.sh.shell.Shell.interactive(Shell.java:461)
>>>                 at
org.xmlsh.commands.builtin.xmlsh.run(xmlsh.java:82)
>>>                 at
>> org.xmlsh.core.BuiltinCommand.run(BuiltinCommand.java:54)
>>>                 at org.xmlsh.sh.shell.Shell.main(Shell.java:690)
>>> Caused by: java.io.IOException: Error parsing HTTP headers:
Premature
>> EOF, partial header line read: ''
>>>                 at
>> com.marklogic.http.HttpHeaders.nextHeaderLine(HttpHeaders.java:326)
>>>                 at
>>
com.marklogic.http.HttpHeaders.parseResponseHeaders(HttpHeaders.java:2
>> 87
>> )
>>>                 at
>> com.marklogic.http.HttpChannel.parseHeaders(HttpChannel.java:323)
>>>                 at
>> com.marklogic.http.HttpChannel.receiveMode(HttpChannel.java:293)
>>>                 at
>> com.marklogic.http.HttpChannel.getResponseCode(HttpChannel.java:187)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.EvalRequestController.issueRequest(Eva
>> lR
>> equestController.java:111)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.EvalRequestController.serverDialog(Eva
>> lR
>> equestController.java:62)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(A
>> bs
>> tractRequestController.java:72)
>>>                 ... 29 more
>>> 
>>> 
>>> 
>>> From: general-boun...@developer.marklogic.com
>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Lee, 
>> David
>>> Sent: Saturday, March 13, 2010 7:42 PM
>>> To: General Mark Logic Developer Discussion
>>> Subject: RE: [MarkLogic Dev General]
>> ServerConnectionException-consistantly after about 20, 000 files
>>> 
>>> Here's a full stack trace, including my code in the stack.
>>> by "opening connections" I mean calling
>>> 
>>>      URI serverUri = new URI (connect);
>>>      ContentSource cs = ContentSourceFactory.newContentSource
>> (serverUri);
>>> 
>>> for ever file instead of reusing the ContentSource for all files.
>>> Although that may be a red-herring ... when I do it that way (new
>> Content Source for each file) I'm not aborting the push operation if 
>> one file fails so I may be missing these errors in that case.
>>> 
>>> --------- Stack Trace
>>> 
>>> 
>>> 
>>> 2010-03-13 16:17:13,748 12310138 ERROR [main] core.SimpleCommand -
>> Exception running command: ml:put
>>> com.marklogic.xcc.exceptions.ServerConnectionException: An 
>>> established
>> connection was aborted by the software in your host machine
>>>  [Session: user=DLEE, cb={default} [ContentSource: user=DLEE,
>> cb={none} [provider: address=home/192.168.1.10:8011, pool=0/64]]]
>>>                 at
>>
com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(A
>> bs
>> tractRequestController.java:99)
>>>                 at
>>
com.marklogic.xcc.impl.SessionImpl.insertContent(SessionImpl.java:204)
>>>                 at org.xmlsh.marklogic.put.load(put.java:180)
>>>                 at org.xmlsh.marklogic.put.load(put.java:171)
>>>                 at org.xmlsh.marklogic.put.run(put.java:99)
>>>                 at org.xmlsh.core.XCommand.run(XCommand.java:86)
>>>                 at org.xmlsh.core.XCommand.run(XCommand.java:63)
>>>                 at
>> org.xmlsh.sh.core.SimpleCommand.exec(SimpleCommand.java:121)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.core.Pipeline.exec(Pipeline.java:124)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.shell.Shell.runScript(Shell.java:362)
>>>                 at
>> org.xmlsh.core.ScriptCommand.run(ScriptCommand.java:75)
>>>                 at
>> org.xmlsh.sh.core.SimpleCommand.exec(SimpleCommand.java:121)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
org.xmlsh.sh.core.Pipeline.exec(Pipeline.java:124)
>>>                 at org.xmlsh.sh.shell.Shell.exec(Shell.java:560)
>>>                 at
>> org.xmlsh.sh.shell.Shell.interactive(Shell.java:461)
>>>                 at
org.xmlsh.commands.builtin.xmlsh.run(xmlsh.java:82)
>>>                 at
>> org.xmlsh.core.BuiltinCommand.run(BuiltinCommand.java:54)
>>>                 at org.xmlsh.sh.shell.Shell.main(Shell.java:690)
>>> Caused by: java.io.IOException: An established connection was
aborted
>> by the software in your host machine
>>>                 at sun.nio.ch.SocketDispatcher.write0(Native Method)
>>>                 at
>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33)
>>>                 at
>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>>>                 at sun.nio.ch.IOUtil.write(IOUtil.java:60)
>>>                 at
>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>>>                 at
>> com.marklogic.http.HttpChannel.writeBuffer(HttpChannel.java:373)
>>>                 at
>> com.marklogic.http.HttpChannel.writeBody(HttpChannel.java:353)
>>>                 at
>> com.marklogic.http.HttpChannel.flushRequest(HttpChannel.java:346)
>>>                 at
>> com.marklogic.http.HttpChannel.write(HttpChannel.java:134)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.ContentInsertController.writeChunkHead
>> er
>> (ContentInsertController.java:299)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.ContentInsertController.issueRequest(C
>> on
>> tentInsertController.java:210)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.ContentInsertController.serverDialog(C
>> on
>> tentInsertController.java:112)
>>>                 at
>>
com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(A
>> bs
>> tractRequestController.java:72)
>>>                 ... 20 more
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From: general-boun...@developer.marklogic.com
>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Sam
Neth
>>> Sent: Saturday, March 13, 2010 6:08 PM
>>> To: General Mark Logic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] ServerConnectionException
>> -consistantly after about 20, 000 files
>>> 
>>> Could you post a stack trace?
>>> 
>>> What version of XCC are you using?
>>> 
>>> What specifically are you referring to when you talk about "opening
>> connections"?
>>> 
>>> On Mar 13, 2010, at 2:33 PM, Lee, David wrote:
>>> 
>>> 
>>> If I use XCC to iteratively insert a large set of documents I
>> consistently get this error
>>> 
>>> com.marklogic.xcc.exceptions.ServerConnectionException: An 
>>> established
>> connectin was aborted by the software in your host machine [Session:
>> user=DLEE, cb={default} [ContentSource: user=DLEE, cb={none}
[providr:
>> address=home/192.168.1.10:8011, pool=0/64]]]
>>> 
>>> 
>>> This occurs after about 20,000 files and aborts the program.
>>> I'm thinking of implementing a exception handler to retry but I dont
>> want to be retrying after more serious errors.
>>> The server log doesnt show any problems, and this is on a dedicated
>> 1GB wired LAN so I dont think its internet problems.
>>> 
>>> If instead of using the same connection I open the connection for 
>>> each
>> file it often gets around this problem, but not always,
>>> I think its getting around it because I'm not aborting on error in
>> that case (just going to the next file).
>>> 
>>> I'm using this code snippet to create the content in bulks of 1-20 (
>> files in a directory )
>>> 
>>> Content content= ContentFactory.newContent (uri, file,
>> mCreateOptions);
>>> contents.add(content);
>>> ...
>>> 
>>> if( ! contents.isEmpty() )
>>>      session.insertContent (contents.toArray(new Content[
>> contents.size()]));
>>> 
>>> 
>>> 
>>> Any suggestions ?
>>> 
>>> 
>>> 
>>> ----------------------------------------
>>> David A. Lee
>>> Senior Principal Software Engineer
>>> Epocrates, Inc.
>>> d...@epocrates.com
>>> 812-482-5224
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://xqzone.com/mailman/listinfo/general
>> 
>> ---
>> Ron Hitchens {mailto:r...@ronsoft.com}   Ronsoft Technologies
>>      (650) 766-2355 (Home Office)       http://www.ronsoft.com
>>      (707) 924-3878 (fax)               Bit Twiddling At Its Finest
>> "No amount of belief establishes any fact." -Unknown
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://xqzone.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://xqzone.com/mailman/listinfo/general

---
Ron Hitchens {mailto:r...@ronsoft.com}   Ronsoft Technologies
     (650) 766-2355 (Home Office)       http://www.ronsoft.com
     (707) 924-3878 (fax)               Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown





_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Reply via email to