The approach you outline could help, or it might not help. I'd start by
asking why the document processing is taking so much time.
From the description, it isn't clear what part of document processing
is the bottleneck. It sounds like you have a few good test cases: you
might start by looking at the logs to see if you can isolate the slowest
step in your processing pipeline. Then you could try that step manually
in cq, where you can profile the query execution.
You have a test document took 90-min to process into 60 documents. How
large is the test document? How large are the 60 new documents?
-- Mike
On 2009-11-19 09:23, Runstein, Robert E. (Contr) (IS) wrote:
My application ingests documents that need to be broken up into subdocuments.
We want the process to be atomic so our initial approach was to run in within a
single CPF pipeline.
While this works fine for small documents we have encountered larger documents
that time out because processing takes longer than the time limit set for the
task server. We increasing the time limit works but this does not seem to be
an optimal solution since an example document took over 1.5 hours to process
into 60 sub documents. In addition, the parent documents are sent to us by an
external provider and our interface allows them to send an unlimited number of
elements for processing into sub documents. They will not change their data
and there is no guarantee that any chosen time limit would be sufficiently
long to allow processing to complete.
One solution could be to process each subdocument in a separate transaction,
but write them to a temporary collection. If all subdocuments are processed
successfully they could be moved to the destination collection in a single
transaction. If any failed processing all of them would be deleted and an
error logged.
Is this a reasonable approach to avoiding a single long running transaction?
Can you recommend alternatives? Thanks.
Bob
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general