Hi Abhishek,
I had this thread configuration in my list, but did not try with that.
Since it worked for you, let me try.

Thanks for confirming.
Shan.

On Tue, Dec 6, 2016 at 11:00 AM, Jain, Abhishek <
abhishek.b.j...@capgemini.com> wrote:

> Hi shiv,
>
> I have answer a similar query earlier. Will surEly work for you as well.
> Please check below..
>
> Sent with Good Work (www.blackberry.com)
> On 23 Sep 2016 8:45 pm, Stuart Myles <stuart.my...@gmail.com> wrote:
>
> Thanks! This helped me prevent the errors from occurring and - as a bonus
> - significantly sped up my ingestion.
>
> I couldn't use exactly the mlcp command line you suggested, since - in the
> version of mlcp I'm using - -input_file_type xml isn't allowed, I had to
> use -input_file_type documents instead. Also, my input files don't need to
> be split. However, bumping up the threads used (to 30 in my case) made the
> transaction / timeout complaints go away. And now I'm ingesting 100,000
> documents in 12 minutes, rather than one hour. Much better!
>
> Regards,
>
> Stuart
>
>
>
> On Fri, Sep 23, 2016 at 3:34 AM, Jain, Abhishek <
> abhishek.b.j...@capgemini.com> wrote:
>
>> Hi Stuart,
>>
>>
>>
>> MLCP comes with various options, and can be used in  various combinations
>> depending on the file size, memory available and
>>
>> Other number of nodes, forest etc.
>>
>>
>>
>> If you want to try a quick solution you can try this mlcp command :
>>
>> *mlcp import -host yourhost -port 8000 -username userName -password
>> PASSWORD -input_file_type xml -input_file_path TempData -thread_count
>>  -thread_count_per_split 3 -batch_size 200  -transaction_size 20
>> -max_split_size 33554432 -split_input true*
>>
>> change username, input file type etc accordingly.
>>
>> It’s always good to use splits and threads when working with huge dataset.
>>
>> Some performance matrix you can consider while using above mlcp :
>>
>> 1.       In app server settings you can check if connection time out is
>> set to 0.
>>
>> 2.       Default spilt size is 32MB, if you can change *-max_split_size
>> 33554432 *( it take in bytes, if your file is bigger )
>>
>> 3.       Make sure split and thread ratio remains 1:2 or 1:3 for example
>>
>> If your document size is 10 MB, and your split size is 1000,000 (1 MB)
>> then 10/1 = 10 splits
>>
>> Then you should create 20 or 30 thread for best CPU utilization.
>>
>> 4.       The above mlcp does well with 150 Million rows, should work for
>> you as well.
>>
>> 5.       I assume you have a nice good RAM > 4GB alteast.
>>
>>
>>
>> Thanks and Regards,
>>
>> [image: Email_CBE.gif]Abhishek Jain
>>
>> Associate Consultant
>>
>> Capgemini India | Hyderabad
>>
>>
>>
>> *From:* general-boun...@developer.marklogic.com [mailto:
>> general-boun...@developer.marklogic.com] *On Behalf Of *Stuart Myles
>> *Sent:* Thursday, September 22, 2016 11:52 PM
>> *To:* MarkLogic Developer Discussion
>> *Subject:* [MarkLogic Dev General] mlcp Transaction Errors - SVC-EXTIME
>> and XDMP-NOTXN
>>
>>
>>
>> When I'm loading directories of slightly fewer than 100,000 XML files
>> into a large MarkLogic instance, I often get timeout and transaction
>> errors. If I re-run the same directory of files which got those errors, I
>> typically don't get any errors.
>>
>>
>>
>> So, I have a few questions:
>>
>>
>>
>> * Can I get prevent the errors from happening in the first place - e.g.
>> by tuning MarkLogic parameters or altering my use of mlcp?
>>
>> * If I do get errors, what is the best way to get a report on the files
>> which failed, so I can retry just those ones? Is the best option for me to
>> write some code to pick out the errors from the log file? And, if so, am I
>> guaranteed to get all of the files reported?
>>
>>
>>
>> Some Details
>>
>>
>>
>> The command line template is
>>
>>
>>
>> mlcp.sh import -username {1} -password {2} -host localhost -port {4}
>> -input_file_path {5} -output_uri_replace \"{6},'{7}'\"
>>
>>
>>
>> Sometimes, the imports run just fine. However, often I get a large number
>> of SVC-EXTIME errors followed by a XDMP-NOTXN error. For example:
>>
>>
>>
>> 16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit
>> exceeded
>>
>> 16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document
>> 029ccd8ac3323658277ca28fead7a73d.0.xml in file:/mnt/ingestion/MarkLogicI
>> ngestion/smyles/todo/2014_0005.done/029ccd8ac3323658277c
>> a28fead7a73d.0.xml
>>
>> 16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit
>> exceeded
>>
>> 16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document
>> 02eb4562784255e249c4ec3ed472f9aa.1.xml in file:/mnt/ingestion/MarkLogicI
>> ngestion/smyles/todo/2014_0005.done/02eb4562784255e249c4
>> ec3ed472f9aa.1.xml
>>
>> 16/09/22 17:54:04 INFO contentpump.LocalJobRunner:  completed 33%
>>
>> 16/09/22 17:54:21 ERROR mapreduce.ContentWriter: XDMP-NOTXN: No
>> transaction with identifier 9076269665213828952
>>
>>
>>
>> So far, I'm just rerunning the entire directory again. Most of the time,
>> it ingests fine on the second attempt. However, I have thousands of these
>> directories to process. So, I would prefer to avoid getting the errors in
>> the first place. Failing that, I would like to capture the errors and just
>> retry the files which failed.
>>
>>
>>
>> Any help much appreciated.
>>
>>
>> Regards,
>>
>> Stuart
>>
>>
>>
>>
>>
>> This message contains information that may be privileged or confidential
>> and is the property of the Capgemini Group. It is intended only for the
>> person to whom it is addressed. If you are not the intended recipient, you
>> are not authorized to read, print, retain, copy, disseminate, distribute,
>> or use this message or any part thereof. If you receive this message in
>> error, please notify the sender immediately and delete all copies of this
>> message.
>>
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> Manage your subscription at:
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to