ashulin commented on issue #2127:
URL: 
https://github.com/apache/incubator-seatunnel/issues/2127#issuecomment-1175749813

   > > > org.apache.seatunnel.api.source.SourceReader#snapshotState looks 
similar with 
org.apache.seatunnel.api.source.SourceSplitEnumerator#snapshotState , but 
return a different type(List and StateT), the comments is split checkpoint 
state. while actually it returns List, they are not same in my mind. Could we 
have a chance to unify the snapshot behavior?
   > > 
   > > 
   > > `SourceSplitEnumerator#snapshotState` and `SourceReader#snapshotState` 
are different. `SourceSplitEnumerator` assumes the role of coordinator, which 
may require information beyond the snapshot split. `SourceReader` is designed 
to only need splits to run, so the snapshot returns `List<SplitT>`.
   > 
   > In my option, `SplitT` is the type of `SourceSplit`, not the State of 
`checkpoint`, I wonder if `List<StateT>` is more appropriate?
   
   The `List<SplitT>` returned by SourceReader#snapshotState will be added to 
SourceSplitEnumerator#addSplitsBack. This is to run normally when the 
parallelism of the source is changed, so the state of the reader needs to be 
able to be converted to split.
   
   To avoid ambiguity, we can add 
   ```java
   public interface SourceSplitState {
       SourceSplit toSplit();
   }
   ```
   and `SourceReader#snapshotState` return `List<SourceSplitState>`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to