carp84 commented on pull request #8751: URL: https://github.com/apache/flink/pull/8751#issuecomment-804158779
@StephanEwen please correct me if I'm wrong, but I think the issue FLIP-158 is trying to resolve is orthogonal with this one (FLINK-11937). On one hand, IIUC, the snapshot interval (for generating SST files (take RocksDB for example) to truncate change-logs) in FLIP-158 design would be configurable, and if it's set to some value similar to the old checkpoint interval, eg. 10min, then we will have similar small file problem as observed now. Actually I don't think this snapshot interval should be too long since it will decide how much logs to replay during restore thus affecting the recovery speed. OTOH, I'm not sure about the value of using change-log based checkpoint with long checkpoint interval (like more than 10min). Saving change logs will consume additional network bandwidth and disk space (since the SST uploading process is reserved for log truncation) and increase the latency of routine record processing (for "double-writing"), which is a good trade-off for momentary checkpoint interval but not that effective with long ones, IMHO. Not sure whether I'm missing anything, and please let me know your thoughts. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
