gnodet commented on PR #22159:
URL: https://github.com/apache/camel/pull/22159#issuecomment-4144168967

   _Claude Code on behalf of Guillaume Nodet_
   
   ## Analysis: Bulk Component vs. Splitter EIP Enhancements
   
   After reviewing the Bulk component in depth and discussing the design, we've 
concluded that the features provided by the Bulk component would be better 
served as enhancements to the existing Splitter EIP rather than as a standalone 
component. Here's the full analysis.
   
   ### The Core Question
   
   The Bulk component provides: chunking, error thresholds, failure tracking, 
watermark-based resume, multi-step pipelines, and transaction support. But most 
of these features are either natural extensions of the Splitter EIP or 
composable utilities — not justification for a separate component.
   
   ### Feature-by-Feature Mapping to Splitter
   
   #### 1. Chunking → `group(int)` on Splitter
   
   The Splitter already supports `group` for tokenized strings, but not for 
collections of objects. Adding a `group(int)` option that wraps any iterator 
with a chunking iterator is a small, natural enhancement:
   
   ```java
   // Today: manual pre-partitioning required
   .process(e -> { /* partition list into sub-lists */ })
   .split(body())
       .to("direct:processChunk")
   .end();
   
   // Enhanced: built-in chunking
   .split(body()).group(100)
       .to("direct:processChunk")  // body is List of up to 100 items
   .end();
   ```
   
   #### 2. Error Threshold → `errorThreshold(double)` / `maxFailedRecords(int)` 
on Splitter
   
   This is a direct generalization of `stopOnException()`. Instead of a boolean 
"stop on first failure", the Splitter would count failures and abort mid-stream 
when the threshold is exceeded:
   
   ```java
   // Today: all-or-nothing
   .split(body()).stopOnException()
   
   // Enhanced: ratio/count-based abort
   .split(body()).errorThreshold(0.1).maxFailedRecords(50)
   ```
   
   #### 3. Failure Tracking → Built-in `SplitResult`
   
   When error threshold is configured, the Splitter would internally track 
failures `(index, item, exception)`. After completion, if no custom 
`AggregationStrategy` is set, the exchange body becomes a `SplitResult` with 
`totalItems`, `successCount`, `failureCount`, `failures` list, `duration`, and 
`aborted` flag. Output headers (`CamelSplitSuccess`, `CamelSplitFailed`, etc.) 
provide quick access.
   
   #### 4. Watermark Tracking → On the Splitter
   
   In practice, watermarking is always paired with splitting — you split a 
collection and want to remember where you left off. Adding `watermarkStore`, 
`watermarkKey`, and `watermarkExpression` to the Splitter keeps the pattern 
self-contained:
   
   ```java
   .split(body())
       .watermarkStore("#myStore").watermarkKey("importJob")
       .to("direct:process")
   .end()
   ```
   
   Index-based watermarks skip already-processed items; value-based watermarks 
set a header and update the store after completion.
   
   #### 5. Multi-Step with Accept Policy → Not needed
   
   The Bulk component's multi-step model (all items through step 1, then 
eligible items through step 2) is a fundamentally different execution model 
from the Splitter's per-item route. But the common use cases are already 
covered:
   
   - **Per-item error recovery**: `doTry/doCatch` inside the split route
   - **ETL validate-then-load**: if validation throws, the item stops; 
successful items continue
   
   The rare case of "step 2 depends on collective outcome of step 1" can use 
two sequential splits with filtering.
   
   #### 6. Transaction Support → Already on Splitter
   
   The Splitter already supports `shareUnitOfWork()` and transaction context 
propagation — these were the patterns we replicated in the Bulk component.
   
   #### 7. onComplete Callback → Not needed
   
   After `.end()`, the `SplitResult` is on the exchange. Just 
`.to("direct:notify")` or `.wireTap()`.
   
   ### Summary
   
   | Bulk Feature | Splitter Enhancement | Effort |
   |---|---|---|
   | Chunking | `group(int)` | Small |
   | Error threshold | `errorThreshold(double)` / `maxFailedRecords(int)` | 
Medium |
   | Failure tracking | `SplitResult` + failure list | Medium |
   | Watermark tracking | `watermarkStore` / `watermarkKey` / 
`watermarkExpression` | Medium |
   | Multi-step accept policy | Drop — use `doTry/doCatch` | — |
   | Transaction support | Already exists on Splitter | — |
   | onComplete | Not needed — just `.to()` after `.end()` | — |
   
   ### Conclusion
   
   Enhancing the Splitter EIP is the better path:
   - Users don't need to learn a new component — they extend what they already 
know
   - The features are independently useful (chunking alone, error threshold 
alone, etc.)
   - It composes with everything the Splitter already supports 
(`parallelProcessing`, `streaming`, `shareUnitOfWork`, etc.)
   
   Closing this PR in favor of a Splitter enhancement approach. A new JIRA 
ticket will track the Splitter improvements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to