Re: [MarkLogic Dev General] mlcp Transaction Errors - SVC-EXTIME and XDMP-NOTXN #CGO#

Jain, Abhishek Tue, 06 Dec 2016 08:01:04 -0800

Hi shiv,

I have answer a similar query earlier. Will surEly work for you as well. Please 
check below..


Sent with Good Work (www.blackberry.com)

On 23 Sep 2016 8:45 pm, Stuart Myles <[email protected]> wrote:
Thanks! This helped me prevent the errors from occurring and - as a bonus - 
significantly sped up my ingestion.

I couldn't use exactly the mlcp command line you suggested, since - in the 
version of mlcp I'm using - -input_file_type xml isn't allowed, I had to use 
-input_file_type documents instead. Also, my input files don't need to be 
split. However, bumping up the threads used (to 30 in my case) made the 
transaction / timeout complaints go away. And now I'm ingesting 100,000 
documents in 12 minutes, rather than one hour. Much better!

Regards,

Stuart



On Fri, Sep 23, 2016 at 3:34 AM, Jain, Abhishek 
<[email protected]<mailto:[email protected]>> wrote:
Hi Stuart,

MLCP comes with various options, and can be used in  various combinations 
depending on the file size, memory available and
Other number of nodes, forest etc.

If you want to try a quick solution you can try this mlcp command :
mlcp import -host yourhost -port 8000 -username userName -password PASSWORD 
-input_file_type xml -input_file_path TempData -thread_count  
-thread_count_per_split 3 -batch_size 200  -transaction_size 20 -max_split_size 
33554432 -split_input true
change username, input file type etc accordingly.
It’s always good to use splits and threads when working with huge dataset.
Some performance matrix you can consider while using above mlcp :

1.       In app server settings you can check if connection time out is set to 
0.

2.       Default spilt size is 32MB, if you can change -max_split_size 33554432 
( it take in bytes, if your file is bigger )

3.       Make sure split and thread ratio remains 1:2 or 1:3 for example
If your document size is 10 MB, and your split size is 1000,000 (1 MB) then 
10/1 = 10 splits
Then you should create 20 or 30 thread for best CPU utilization.

4.       The above mlcp does well with 150 Million rows, should work for you as 
well.

5.       I assume you have a nice good RAM > 4GB alteast.

Thanks and Regards,
[Email_CBE.gif]Abhishek Jain
Associate Consultant
Capgemini India | Hyderabad

From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Stuart Myles
Sent: Thursday, September 22, 2016 11:52 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] mlcp Transaction Errors - SVC-EXTIME and 
XDMP-NOTXN

When I'm loading directories of slightly fewer than 100,000 XML files into a 
large MarkLogic instance, I often get timeout and transaction errors. If I 
re-run the same directory of files which got those errors, I typically don't 
get any errors.

So, I have a few questions:

* Can I get prevent the errors from happening in the first place - e.g. by 
tuning MarkLogic parameters or altering my use of mlcp?
* If I do get errors, what is the best way to get a report on the files which 
failed, so I can retry just those ones? Is the best option for me to write some 
code to pick out the errors from the log file? And, if so, am I guaranteed to 
get all of the files reported?

Some Details

The command line template is

mlcp.sh import -username {1} -password {2} -host localhost -port {4} 
-input_file_path {5} -output_uri_replace \"{6},'{7}'\"

Sometimes, the imports run just fine. However, often I get a large number of 
SVC-EXTIME errors followed by a XDMP-NOTXN error. For example:

16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit exceeded
16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document 
029ccd8ac3323658277ca28fead7a73d.0.xml in 
file:/mnt/ingestion/MarkLogicIngestion/smyles/todo/2014_0005.done/029ccd8ac3323658277ca28fead7a73d.0.xml
16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit exceeded
16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document 
02eb4562784255e249c4ec3ed472f9aa.1.xml in 
file:/mnt/ingestion/MarkLogicIngestion/smyles/todo/2014_0005.done/02eb4562784255e249c4ec3ed472f9aa.1.xml
16/09/22 17:54:04 INFO contentpump.LocalJobRunner:  completed 33%
16/09/22 17:54:21 ERROR mapreduce.ContentWriter: XDMP-NOTXN: No transaction 
with identifier 9076269665213828952

So far, I'm just rerunning the entire directory again. Most of the time, it 
ingests fine on the second attempt. However, I have thousands of these 
directories to process. So, I would prefer to avoid getting the errors in the 
first place. Failing that, I would like to capture the errors and just retry 
the files which failed.

Any help much appreciated.

Regards,

Stuart



This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] mlcp Transaction Errors - SVC-EXTIME and XDMP-NOTXN #CGO#

Reply via email to