davsclaus opened a new pull request, #23908:
URL: https://github.com/apache/camel/pull/23908

   ## Backport of #21689
   
   Cherry-pick of #21689 onto `camel-4.18.x`.
   
   **Original PR:** #21689 - CAMEL-23120 - camel-docling - Implement batchSize 
sub-batch partitioning in batch processing
   **Original author:** @oscerd
   **Target branch:** `camel-4.18.x`
   
   ### Original description
   
   The batchSize configuration parameter (default 10) was declared and read 
from headers in processBatchConversion() and processBatchStructuredData(), but 
the value was never actually applied. Both convertDocumentsBatch() and 
convertStructuredDataBatch() submitted all documents to the executor at once 
regardless of batchSize, making the parameter a no-op.
   
   This change makes batchSize control how many documents are submitted per 
sub-batch. Documents are partitioned into chunks of batchSize and each 
sub-batch is processed to completion before starting the next one. Within each 
sub-batch, up to batchParallelism threads run concurrently. The overall 
batchTimeout is tracked across sub-batches so remaining time decreases as 
sub-batches complete, and failOnFirstError stops processing across sub-batch 
boundaries.
   
   This provides back-pressure and controls memory usage when processing large 
document sets, preventing the creation of unbounded numbers of 
CompletableFutures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to