The approach you outline could help, or it might not help. I'd start by asking why the document processing is taking so much time.

From the description, it isn't clear what part of document processing is the bottleneck. It sounds like you have a few good test cases: you might start by looking at the logs to see if you can isolate the slowest step in your processing pipeline. Then you could try that step manually in cq, where you can profile the query execution.

You have a test document took 90-min to process into 60 documents. How large is the test document? How large are the 60 new documents?

-- Mike

On 2009-11-19 09:23, Runstein, Robert E. (Contr) (IS) wrote:
My application ingests documents that need to be broken up into subdocuments.  
We want the process to be atomic so our initial approach was to run in within a 
single CPF pipeline.

While this works fine for small documents we have encountered larger documents 
that time out because processing takes longer than the time limit set for the 
task server.  We increasing the time limit works but this does not seem to be 
an optimal solution since an example document took over 1.5 hours to process 
into 60 sub documents.  In addition, the parent documents are sent to us by an 
external provider and our interface allows them to send an unlimited number of 
elements for processing into sub documents.  They will not change their data 
and there is no guarantee that any chosen  time limit would be sufficiently 
long to allow processing to complete.

One solution could be to process each subdocument in a separate transaction, 
but write them to a temporary collection.  If all subdocuments are processed 
successfully they could be moved to the destination collection in a single 
transaction.  If any failed processing all of them would be deleted and an 
error logged.

Is this a reasonable approach to avoiding a single long running transaction?  
Can you recommend alternatives?  Thanks.

Bob


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to