[
https://issues.apache.org/jira/browse/FLINK-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-19008:
-----------------------------------
Labels: auto-deprioritized-major auto-deprioritized-minor perfomance
usability (was: auto-deprioritized-major perfomance stale-minor usability)
Priority: Not a Priority (was: Minor)
This issue was labeled "stale-minor" 7 days ago and has not received any
updates so it is being deprioritized. If this ticket is actually Minor, please
raise the priority and ask a committer to assign you the issue or revive the
public discussion.
> Flink Job runs slow after restore + downscale from an incremental checkpoint
> (rocksdb)
> --------------------------------------------------------------------------------------
>
> Key: FLINK-19008
> URL: https://issues.apache.org/jira/browse/FLINK-19008
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: Jun Qin
> Priority: Not a Priority
> Labels: auto-deprioritized-major, auto-deprioritized-minor,
> perfomance, usability
>
> A customer runs a Flink job with RocksDB state backend. Checkpoints are
> retained and done incrementally. The state size is several TB. When they
> restore + downscale from a retained checkpoint, although the downloading of
> checkpoint files took ~20min, the job throughput returns to the expected
> level only after 3 hours.
> I do not have RocksDB logs. The suspicion for those 3 hours is due to heavy
> RocksDB compaction and/or flush. As it was observed that checkpoint could not
> finish faster enough due to long {{checkpoint duration (sync)}}. How can we
> make this restoring phase shorter?
> For compaction, I think it is worth to check the improvement of:
> {code:c}
> CompactionPri compaction_pri = kMinOverlappingRatio;{code}
> which has been set to default in RocksDB 6.x:
> {code:c}
> // In Level-based compaction, it Determines which file from a level to be
> // picked to merge to the next level. We suggest people try
> // kMinOverlappingRatio first when you tune your database.
> enum CompactionPri : char {
> // Slightly prioritize larger files by size compensated by #deletes
> kByCompensatedSize = 0x0,
> // First compact files whose data's latest update time is oldest.
> // Try this if you only update some hot keys in small ranges.
> kOldestLargestSeqFirst = 0x1,
> // First compact files whose range hasn't been compacted to the next level
> // for the longest. If your updates are random across the key space,
> // write amplification is slightly better with this option.
> kOldestSmallestSeqFirst = 0x2,
> // First compact files whose ratio between overlapping size in next level
> // and its size is the smallest. It in many cases can optimize write
> // amplification.
> kMinOverlappingRatio = 0x3,
> };
> ...
> // Default: kMinOverlappingRatio
> CompactionPri compaction_pri = kMinOverlappingRatio;{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)