gnodet commented on PR #22159:
URL: https://github.com/apache/camel/pull/22159#issuecomment-4144168967
_Claude Code on behalf of Guillaume Nodet_
## Analysis: Bulk Component vs. Splitter EIP Enhancements
After reviewing the Bulk component in depth and discussing the design, we've
concluded that the features provided by the Bulk component would be better
served as enhancements to the existing Splitter EIP rather than as a standalone
component. Here's the full analysis.
### The Core Question
The Bulk component provides: chunking, error thresholds, failure tracking,
watermark-based resume, multi-step pipelines, and transaction support. But most
of these features are either natural extensions of the Splitter EIP or
composable utilities — not justification for a separate component.
### Feature-by-Feature Mapping to Splitter
#### 1. Chunking → `group(int)` on Splitter
The Splitter already supports `group` for tokenized strings, but not for
collections of objects. Adding a `group(int)` option that wraps any iterator
with a chunking iterator is a small, natural enhancement:
```java
// Today: manual pre-partitioning required
.process(e -> { /* partition list into sub-lists */ })
.split(body())
.to("direct:processChunk")
.end();
// Enhanced: built-in chunking
.split(body()).group(100)
.to("direct:processChunk") // body is List of up to 100 items
.end();
```
#### 2. Error Threshold → `errorThreshold(double)` / `maxFailedRecords(int)`
on Splitter
This is a direct generalization of `stopOnException()`. Instead of a boolean
"stop on first failure", the Splitter would count failures and abort mid-stream
when the threshold is exceeded:
```java
// Today: all-or-nothing
.split(body()).stopOnException()
// Enhanced: ratio/count-based abort
.split(body()).errorThreshold(0.1).maxFailedRecords(50)
```
#### 3. Failure Tracking → Built-in `SplitResult`
When error threshold is configured, the Splitter would internally track
failures `(index, item, exception)`. After completion, if no custom
`AggregationStrategy` is set, the exchange body becomes a `SplitResult` with
`totalItems`, `successCount`, `failureCount`, `failures` list, `duration`, and
`aborted` flag. Output headers (`CamelSplitSuccess`, `CamelSplitFailed`, etc.)
provide quick access.
#### 4. Watermark Tracking → On the Splitter
In practice, watermarking is always paired with splitting — you split a
collection and want to remember where you left off. Adding `watermarkStore`,
`watermarkKey`, and `watermarkExpression` to the Splitter keeps the pattern
self-contained:
```java
.split(body())
.watermarkStore("#myStore").watermarkKey("importJob")
.to("direct:process")
.end()
```
Index-based watermarks skip already-processed items; value-based watermarks
set a header and update the store after completion.
#### 5. Multi-Step with Accept Policy → Not needed
The Bulk component's multi-step model (all items through step 1, then
eligible items through step 2) is a fundamentally different execution model
from the Splitter's per-item route. But the common use cases are already
covered:
- **Per-item error recovery**: `doTry/doCatch` inside the split route
- **ETL validate-then-load**: if validation throws, the item stops;
successful items continue
The rare case of "step 2 depends on collective outcome of step 1" can use
two sequential splits with filtering.
#### 6. Transaction Support → Already on Splitter
The Splitter already supports `shareUnitOfWork()` and transaction context
propagation — these were the patterns we replicated in the Bulk component.
#### 7. onComplete Callback → Not needed
After `.end()`, the `SplitResult` is on the exchange. Just
`.to("direct:notify")` or `.wireTap()`.
### Summary
| Bulk Feature | Splitter Enhancement | Effort |
|---|---|---|
| Chunking | `group(int)` | Small |
| Error threshold | `errorThreshold(double)` / `maxFailedRecords(int)` |
Medium |
| Failure tracking | `SplitResult` + failure list | Medium |
| Watermark tracking | `watermarkStore` / `watermarkKey` /
`watermarkExpression` | Medium |
| Multi-step accept policy | Drop — use `doTry/doCatch` | — |
| Transaction support | Already exists on Splitter | — |
| onComplete | Not needed — just `.to()` after `.end()` | — |
### Conclusion
Enhancing the Splitter EIP is the better path:
- Users don't need to learn a new component — they extend what they already
know
- The features are independently useful (chunking alone, error threshold
alone, etc.)
- It composes with everything the Splitter already supports
(`parallelProcessing`, `streaming`, `shareUnitOfWork`, etc.)
Closing this PR in favor of a Splitter enhancement approach. A new JIRA
ticket will track the Splitter improvements.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]