Hi shiv, I have answer a similar query earlier. Will surEly work for you as well. Please check below..
Sent with Good Work (www.blackberry.com) On 23 Sep 2016 8:45 pm, Stuart Myles <[email protected]> wrote: Thanks! This helped me prevent the errors from occurring and - as a bonus - significantly sped up my ingestion. I couldn't use exactly the mlcp command line you suggested, since - in the version of mlcp I'm using - -input_file_type xml isn't allowed, I had to use -input_file_type documents instead. Also, my input files don't need to be split. However, bumping up the threads used (to 30 in my case) made the transaction / timeout complaints go away. And now I'm ingesting 100,000 documents in 12 minutes, rather than one hour. Much better! Regards, Stuart On Fri, Sep 23, 2016 at 3:34 AM, Jain, Abhishek <[email protected]<mailto:[email protected]>> wrote: Hi Stuart, MLCP comes with various options, and can be used in various combinations depending on the file size, memory available and Other number of nodes, forest etc. If you want to try a quick solution you can try this mlcp command : mlcp import -host yourhost -port 8000 -username userName -password PASSWORD -input_file_type xml -input_file_path TempData -thread_count -thread_count_per_split 3 -batch_size 200 -transaction_size 20 -max_split_size 33554432 -split_input true change username, input file type etc accordingly. It’s always good to use splits and threads when working with huge dataset. Some performance matrix you can consider while using above mlcp : 1. In app server settings you can check if connection time out is set to 0. 2. Default spilt size is 32MB, if you can change -max_split_size 33554432 ( it take in bytes, if your file is bigger ) 3. Make sure split and thread ratio remains 1:2 or 1:3 for example If your document size is 10 MB, and your split size is 1000,000 (1 MB) then 10/1 = 10 splits Then you should create 20 or 30 thread for best CPU utilization. 4. The above mlcp does well with 150 Million rows, should work for you as well. 5. I assume you have a nice good RAM > 4GB alteast. Thanks and Regards, [Email_CBE.gif]Abhishek Jain Associate Consultant Capgemini India | Hyderabad From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Stuart Myles Sent: Thursday, September 22, 2016 11:52 PM To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] mlcp Transaction Errors - SVC-EXTIME and XDMP-NOTXN When I'm loading directories of slightly fewer than 100,000 XML files into a large MarkLogic instance, I often get timeout and transaction errors. If I re-run the same directory of files which got those errors, I typically don't get any errors. So, I have a few questions: * Can I get prevent the errors from happening in the first place - e.g. by tuning MarkLogic parameters or altering my use of mlcp? * If I do get errors, what is the best way to get a report on the files which failed, so I can retry just those ones? Is the best option for me to write some code to pick out the errors from the log file? And, if so, am I guaranteed to get all of the files reported? Some Details The command line template is mlcp.sh import -username {1} -password {2} -host localhost -port {4} -input_file_path {5} -output_uri_replace \"{6},'{7}'\" Sometimes, the imports run just fine. However, often I get a large number of SVC-EXTIME errors followed by a XDMP-NOTXN error. For example: 16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit exceeded 16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document 029ccd8ac3323658277ca28fead7a73d.0.xml in file:/mnt/ingestion/MarkLogicIngestion/smyles/todo/2014_0005.done/029ccd8ac3323658277ca28fead7a73d.0.xml 16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit exceeded 16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document 02eb4562784255e249c4ec3ed472f9aa.1.xml in file:/mnt/ingestion/MarkLogicIngestion/smyles/todo/2014_0005.done/02eb4562784255e249c4ec3ed472f9aa.1.xml 16/09/22 17:54:04 INFO contentpump.LocalJobRunner: completed 33% 16/09/22 17:54:21 ERROR mapreduce.ContentWriter: XDMP-NOTXN: No transaction with identifier 9076269665213828952 So far, I'm just rerunning the entire directory again. Most of the time, it ingests fine on the second attempt. However, I have thousands of these directories to process. So, I would prefer to avoid getting the errors in the first place. Failing that, I would like to capture the errors and just retry the files which failed. Any help much appreciated. Regards, Stuart This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
