The 90 minute run occurred with 12GB memory available to MarkLogic. Increasing memory to 24GB reduced the run time to 13 minutes. Increasing to 31GB shortened the run to 12 minutes, so it looks like insufficient memory was the main cause. All the work is done in a single step within the pipeline, so I'll try your suggestion about profiling that in cq to see where else it can be improved.
Bob -----Original Message----- From: Michael Blakeley [mailto:[email protected]] Sent: Thursday, November 19, 2009 1:19 PM To: General Mark Logic Developer Discussion Cc: Runstein, Robert E. (Contr) (IS) Subject: Re: [MarkLogic Dev General] Avoiding long running transactions The approach you outline could help, or it might not help. I'd start by asking why the document processing is taking so much time. From the description, it isn't clear what part of document processing is the bottleneck. It sounds like you have a few good test cases: you might start by looking at the logs to see if you can isolate the slowest step in your processing pipeline. Then you could try that step manually in cq, where you can profile the query execution. You have a test document took 90-min to process into 60 documents. How large is the test document? How large are the 60 new documents? -- Mike On 2009-11-19 09:23, Runstein, Robert E. (Contr) (IS) wrote: > My application ingests documents that need to be broken up into subdocuments. We want the process to be atomic so our initial approach was to run in within a single CPF pipeline. > > While this works fine for small documents we have encountered larger documents that time out because processing takes longer than the time limit set for the task server. We increasing the time limit works but this does not seem to be an optimal solution since an example document took over 1.5 hours to process into 60 sub documents. In addition, the parent documents are sent to us by an external provider and our interface allows them to send an unlimited number of elements for processing into sub documents. They will not change their data and there is no guarantee that any chosen time limit would be sufficiently long to allow processing to complete. > > One solution could be to process each subdocument in a separate transaction, but write them to a temporary collection. If all subdocuments are processed successfully they could be moved to the destination collection in a single transaction. If any failed processing all of them would be deleted and an error logged. > > Is this a reasonable approach to avoiding a single long running transaction? Can you recommend alternatives? Thanks. > > Bob > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
