Yue Ma created FLINK-33946:
------------------------------
Summary: RocksDb sets setAvoidFlushDuringShutdown to true to speed
up Task Cancel
Key: FLINK-33946
URL: https://issues.apache.org/jira/browse/FLINK-33946
Project: Flink
Issue Type: Improvement
Components: Runtime / State Backends
Affects Versions: 1.19.0
Reporter: Yue Ma
Fix For: 1.19.0
When a Job fails, the task needs to be canceled and re-deployed.
RocksDBStatebackend will call RocksDB.close when disposing.
{code:java}
if (!shutting_down_.load(std::memory_order_acquire) &&
has_unpersisted_data_.load(std::memory_order_relaxed) &&
!mutable_db_options_.avoid_flush_during_shutdown) {
if (immutable_db_options_.atomic_flush) {
autovector<ColumnFamilyData*> cfds;
SelectColumnFamiliesForAtomicFlush(&cfds);
mutex_.Unlock();
Status s =
AtomicFlushMemTables(cfds, FlushOptions(), FlushReason::kShutDown);
s.PermitUncheckedError(); //**TODO: What to do on error?
mutex_.Lock();
} else {
for (auto cfd : *versions_->GetColumnFamilySet()) {
if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) {
cfd->Ref();
mutex_.Unlock();
Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown);
s.PermitUncheckedError(); //**TODO: What to do on error?
mutex_.Lock();
cfd->UnrefAndTryDelete();
}
}
} {code}
By default (avoid_flush_during_shutdown=false) RocksDb requires FlushMemtable
when Close. When the disk pressure is high or the Memtable is large, this
process will be more time-consuming, which will cause the Task to get stuck in
the Canceling stage and affect the speed of job Failover.
In fact, it is completely unnecessary to Flush memtable when Flink Task is
Close, because the data can be replayed from Checkpoint. So we can set
avoid_flush_during_shutdown to true to speed up Task Failover
--
This message was sent by Atlassian Jira
(v8.20.10#820010)