The 90 minute run occurred with 12GB memory available to MarkLogic.
Increasing memory to 24GB reduced the run time to 13 minutes.
Increasing to 31GB shortened the run to 12 minutes, so it looks like
insufficient memory was the main cause.  All the work is done in a
single step within the pipeline, so I'll try your suggestion about
profiling that in cq to see where else it can be improved.

Bob

-----Original Message-----
From: Michael Blakeley [mailto:[email protected]] 
Sent: Thursday, November 19, 2009 1:19 PM
To: General Mark Logic Developer Discussion
Cc: Runstein, Robert E. (Contr) (IS)
Subject: Re: [MarkLogic Dev General] Avoiding long running transactions

The approach you outline could help, or it might not help. I'd start by
asking why the document processing is taking so much time.

 From the description, it isn't clear what part of document processing
is the bottleneck. It sounds like you have a few good test cases: you
might start by looking at the logs to see if you can isolate the slowest
step in your processing pipeline. Then you could try that step manually
in cq, where you can profile the query execution.

You have a test document took 90-min to process into 60 documents. How
large is the test document? How large are the 60 new documents?

-- Mike

On 2009-11-19 09:23, Runstein, Robert E. (Contr) (IS) wrote:
> My application ingests documents that need to be broken up into
subdocuments.  We want the process to be atomic so our initial approach
was to run in within a single CPF pipeline.
>
> While this works fine for small documents we have encountered larger
documents that time out because processing takes longer than the time
limit set for the task server.  We increasing the time limit works but
this does not seem to be an optimal solution since an example document
took over 1.5 hours to process into 60 sub documents.  In addition, the
parent documents are sent to us by an external provider and our
interface allows them to send an unlimited number of elements for
processing into sub documents.  They will not change their data and
there is no guarantee that any chosen  time limit would be sufficiently
long to allow processing to complete.
>
> One solution could be to process each subdocument in a separate
transaction, but write them to a temporary collection.  If all
subdocuments are processed successfully they could be moved to the
destination collection in a single transaction.  If any failed
processing all of them would be deleted and an error logged.
>
> Is this a reasonable approach to avoiding a single long running
transaction?  Can you recommend alternatives?  Thanks.
>
> Bob
>

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to