klion26 commented on issue #8751: [FLINK-11937][StateBackend]Resolve small file 
problem in RocksDB incremental checkpoint
URL: https://github.com/apache/flink/pull/8751#issuecomment-507920049
 
 
   @StephanEwen thanks for the comments, I'm trying to answer the questions 
below:
   - As a high-level description: this change introduces a new state 
handle(`FsSegmentStateHandle`), modify checkpoint metadata(**Do not modify the 
layout**, just add a type for `FsSegmentStateHandle` in 
`SavepointV2Serializer`), add some information in `SharedRegistry` to track the 
underlying file's reference, and other necessary modifications.
   - This is a new option needs users to activate
   - I think there are no compatibility problems, **for the checkpoint meta we 
don't change the layout**, just add a new type for the new state handle, for 
restoring from the exist `FileStateHandle`, we'll delivery to 
`RocksDBStateDownloader#downloadDataForStateHandle` and will read both the 
`FileStateHandle`  and `FsSegmentStateHandle` correctly, for 
`SharedStateRegistry` all the modifications just affect the newly introduced 
state handle only.
   - In my opinion, we can't just do this change in state bankend. I'm trying 
to give the reasons below:
      - first, we need to track the position(start position and end position) 
in the file of currently state handle(because after applying this change, each 
state handle is mapping to a block of one file)
      - second, we need to track the reference count of the underlying file, so 
that we can delete the file in the feature in time(not too early and not too 
late, delete the file too early will encounter problems of `IOException`, 
delete too late will consume too may disk space).
   
   Please let me know if I need to give more information, sir.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to