[
https://issues.apache.org/jira/browse/FLINK-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564890#comment-17564890
]
ming li commented on FLINK-28390:
---------------------------------
Hi, [~masteryhx], [~Zhanghao Chen]
Yes, although we currently have the compaction configuration of FIFO, it is
actually unusable (the TTL and MAX_SIZE of FIFO cannot be configured). In
addition, we do not recommend users to use it, and there is potential data
loss. So I think we have the following work to do:
1. Add FIFO related JNI, we can refer to
https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style;
2. Add the documentation and precautions for using FIFO.
In addition, when we used the FIFO of RocksDB internally, we also found a
potential bug, which also needs to be fixed on the RocksDB branch of Flink. We
can refer to https://github.com/facebook/rocksdb/issues/10133
> Allows RocksDB to configure FIFO Compaction to reduce CPU overhead.
> -------------------------------------------------------------------
>
> Key: FLINK-28390
> URL: https://issues.apache.org/jira/browse/FLINK-28390
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: ming li
> Priority: Major
>
> We know that the fifo compaction strategy may silently delete data and may
> lose data for the business. But in some scenarios, FIFO compaction can be a
> very effective way to reduce CPU usage.
>
> Flink's Taskmanager is usually some small-scale processes, such as allocating
> 4 CPUs and 16G memory. When the state size is small, the CPU overhead
> occupied by RocksDB is not high, and as the state increases, RocksDB may
> frequently be in the compaction operation, which will occupy a large amount
> of CPU and affect the computing operation.
>
> We usually configure a TTL for the state, so when using FIFO we can configure
> it to be slightly longer than the TTL, so that the upper layer is the same as
> before.
>
> Although the FIFO Compaction strategy may bring space amplification, the disk
> is cheaper than the CPU after all, so the overall cost is reduced.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)