Yes, xqsync does retrieve all of the URIs using a dedicated thread. It's not completely done up-front though. Other threads would start to do the actual reading/writing once there are enough URIs. But, reading/writing are much slower compared to retrieving URIs. So if you have a lot of URIs, they'd all be stored in JVM's memory, and may overwhelm the garbage collector.
You can try a newer version of xqsync here: http://marklogic.github.com/xqsync/ Newer versions of xqsync would store URIs in a temporary file (in TMP_DIR, or specified via URI_QUEUE_FILE). This would help with memory pressure, if that's your bottleneck. Hsiao "Shao" Su Senior Performance Engineer MarkLogic Corporation hsiao...@marklogic.com Phone: +1 650 287 2545 www.marklogic.com This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Mike Sokolov Sent: Wednesday, March 14, 2012 1:27 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] xqsync throughput Thanks for the suggestions, Mike. I discovered that DINPUT_QUERY_CACHABLE wasn't true, so I am trying that now; the process kept failing to retrieve uris, so maybe if we fetch them all up front? I looked at the networking a bit - pings are ~ 0.15 ms and I am seeing sustained transfer rates as high as 84MB/s using scp - I think I'd get more with larger files. Also the servers don't seem busy - I am running xqsync on the destination box, which I suppose might not be ideal, but uses less network anyway - it is maxing out one of the cpus during the initial fetch of all the uris (over 10m of them) now that cachable=true. Maybe there is a problem deep paging into the cts:uris query when it is not cached? I'll report back once the data actually starts transferring -Mike On 03/14/2012 10:14 AM, Michael Blakeley wrote: > I would expect better than that. What is the document rate? > > You may not have enough client threads to keep the servers busy. What does > the utilization look like on both sides? > > You may also be memory-limited in the JVM at some point, especially if the > documents are big. If so, the JVM will spend a lot of time running the > garbage collector. You can check that idea with the '-verbose:gc' option. > > Could there be a network limitation other than bandwidth? You might check > that by exporting to packages instead, and see what that performance looks > like. I have seen some cases where there was a slow hop on the network, or > where a firewall was limiting performance. > > -- Mike > > On 14 Mar 2012, at 13:38 , Mike Sokolov wrote: > > >> I wonder if anyone has a rough guide to what sort of transfer speeds can >> be expected using xqsync to transfer a database from one node to >> another. I have two quite beefy servers on the same LAN (at least >> 100Mb/s ~ 12MB/s), and I'm only getting ~30kB/sec. I was hoping to get >> a few orders of magnitude more, but am I smoking crack? Is there >> something I could be doing or not doing that might be limiting the speed >> somehow? >> >> This is my setup: >> >> java -cp ${BIN}/xqsync.jar:$BIN/xcc.jar:$BIN/xstream.jar:$BIN/xpp3.jar >> -Xmx1024m \ >> -DINPUT_CONNECTION_STRING=$SRCDB \ >> -DOUTPUT_CONNECTION_STRING=$DSTDB \ >> -DSKIP_EXISTING=true \ >> -DCOPY_COLLECTIONS=false \ >> -DCOPY_PERMISSIONS=false \ >> -DCOPY_PROPERTIES=true \ >> -DCOPY_QUALITY=false \ >> -DINPUT_BATCH_SIZE=10 \ >> -DINPUT_QUERY_CACHABLE \ >> -DTHREADS=8 \ >> com.marklogic.ps.xqsync.XQSync >> >> These are the startup messages from the log: >> >> INFO: XQSync starting: version 2009-03-10.1 on 1.6.0_26 (Java(TM) SE >> Runtime Environment) >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSync main >> INFO: XCC version = 3.2-7 >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run >> INFO: starting pool of 8 threads, queue size = 10000 >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.Monitor run >> INFO: starting >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run >> INFO: output version info: client 3.2-7, server 4.1-11 >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager run >> INFO: input version info: client 3.2-7, server 4.1-11 >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager >> queueFromInputConnection >> INFO: buffer size = 0, caching = false >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager getUrisRequest >> INFO: listing all documents (with uri lexicon) >> Mar 14, 2012 2:33:36 PM com.marklogic.ps.xqsync.XQSyncManager >> queueFromInputConnection >> >> The connector is a bit old: Can I expect any substantial improvement >> from updating that? >> >> -- >> Michael Sokolov >> Engineering Director >> www.ifactory.com >> >> _______________________________________________ >> General mailing list >> General@developer.marklogic.com >> http://developer.marklogic.com/mailman/listinfo/general >> >> > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general