DanielCarter-stack commented on issue #10398: URL: https://github.com/apache/seatunnel/issues/10398#issuecomment-3798641942
<!-- code-pr-reviewer --> Thanks for reporting. The NPE at `SourceReaderBase.java:210` occurs when `splitStates.remove(finishedSplitId)` returns null, meaning a split ID in `finishedSplits` was never registered in the `splitStates` ConcurrentHashMap. **Evidence:** - Fault line: `SourceReaderBase.java:210` in `finishCurrentFetch()` - `splitStates.remove(finishedSplitId).state` - Registration point: `SourceReaderBase.java:127-137` (`addSplits` method) - Finished splits generation: `IncrementalSourceSplitReader.java:155-160` uses `currentSplitId` - MongoDB CDC inherits this via `MongodbIncrementalSource` extends `IncrementalSource` This suggests a state inconsistency between the split enumerator and reader, likely a race condition or split lifecycle management issue in concurrent scenarios. **To help diagnose:** 1. Does this error occur at approximately the same data volume (~1M records) each time? 2. Is checkpoint enabled? Do you see checkpoint failures around the same time? 3. Are there any logs about split reassignment/recovery before the crash? **Temporary workarounds:** - Adjust `chunk.size` or `batch.size` to reduce split count - Increase `parallelism` to change split allocation patterns - Try the latest 2.3.x release if this is a known issue Please share your job configuration and full logs (especially "Adding split(s) to reader" messages). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
