DanielCarter-stack commented on issue #10398:
URL: https://github.com/apache/seatunnel/issues/10398#issuecomment-3798641942

   <!-- code-pr-reviewer -->
   Thanks for reporting. The NPE at `SourceReaderBase.java:210` occurs when 
`splitStates.remove(finishedSplitId)` returns null, meaning a split ID in 
`finishedSplits` was never registered in the `splitStates` ConcurrentHashMap.
   
   **Evidence:**
   - Fault line: `SourceReaderBase.java:210` in `finishCurrentFetch()` - 
`splitStates.remove(finishedSplitId).state`
   - Registration point: `SourceReaderBase.java:127-137` (`addSplits` method)
   - Finished splits generation: `IncrementalSourceSplitReader.java:155-160` 
uses `currentSplitId`
   - MongoDB CDC inherits this via `MongodbIncrementalSource` extends 
`IncrementalSource`
   
   This suggests a state inconsistency between the split enumerator and reader, 
likely a race condition or split lifecycle management issue in concurrent 
scenarios.
   
   **To help diagnose:**
   1. Does this error occur at approximately the same data volume (~1M records) 
each time?
   2. Is checkpoint enabled? Do you see checkpoint failures around the same 
time?
   3. Are there any logs about split reassignment/recovery before the crash?
   
   **Temporary workarounds:**
   - Adjust `chunk.size` or `batch.size` to reduce split count
   - Increase `parallelism` to change split allocation patterns
   - Try the latest 2.3.x release if this is a known issue
   
   Please share your job configuration and full logs (especially "Adding 
split(s) to reader" messages).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to