[
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609277#comment-17609277
]
Yanfei Lei edited comment on FLINK-29402 at 9/26/22 3:05 AM:
-------------------------------------------------------------
This is a very interesting proposal, I think this is not hard to implement in
Flink. From the [wiki|https://github.com/facebook/rocksdb/wiki/Direct-IO] there
are two options to control the DirectIO: {{use_direct_reads}} and
{{use_direct_io_for_flush_and_compaction, }}and these two options are supported
by current{{{} frocksdb-jni(6.20.3){}}}.
BTW, do you have quantitative benchmark results about DirectIO *ON* vs DirectIO
{*}OFF{*}?
was (Author: yanfei lei):
This is a very interesting proposal, I think this is not hard to implement in
Flink. From the [wiki|https://github.com/facebook/rocksdb/wiki/Direct-IO] there
are two options to control the DirectIO: {{use_direct_reads}} and
{{use_direct_io_for_flush_and_compaction, }}and these two options are supported
by current {{{}frocksdb-jni(6.20.3){}}}.
BTW, do you have quantitative benchmark results about DirectIO *ON* vs DirectIO
{*}OFF{*}?
> Add USE_DIRECT_READ configuration parameter for RocksDB
> -------------------------------------------------------
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Affects Versions: 1.15.2
> Reporter: Donatien
> Priority: Not a Priority
> Labels: Enhancement, rocksdb
> Fix For: 1.15.2
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux
> Page Cache. To understand the impact of Linux Page Cache on performance, one
> can run a heavy workload on a single-tasked Task Manager with a container
> memory limit identical to the TM process memory. Running this same workload
> on a TM with no container memory limit will result in better performances but
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically
> enabled for benchmarks on working set estimation [Zwaenepoel et
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of
> DirectIO for reads thanks to the RocksDB API. This configuration would be
> disabled by default.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)