boyuanzz commented on pull request #13592:
URL: https://github.com/apache/beam/pull/13592#issuecomment-749721610


   > One more concern - the current implementation relies on 
`CheckpointMark#hashCode` and `CheckpointMark#equals`. It is likely that these 
will not have these two correctly implemented. We should stick to 
`Coder#structuralValue` for that.
   > The same holds true for UnboundedSource#hashCode and equals, we should 
probably not use the source in the cache key, because it should be impossible 
for single DoFn instance to read from multiple readers.
   
   I was thinking about using `Coder#structuralValue` but I'm concerning about 
the additional overhead from encoding. The cacheKey is created very 
frequently(at least twice per element) and it's not cheap for coder to encode a 
value. 
   
   As you mentioned, a DoFn instance could process multiple sources especially 
the source allows initial split(and we cannot assume that CheckpointMark 
contains the source info, although in most case it does), that's why I decided 
to use `UnboundedSourceRestriction` to locate a reader. DirectRunner is using 
timers(InMemoryTimerInternals) and states(in memory as well) to reschedule 
checkpoints. It should be a reference instead of a deep copy?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to