Yun Tang created FLINK-19710:
--------------------------------
Summary: Avoid performance regression introduced by thread-local
keyword of FRocksDB
Key: FLINK-19710
URL: https://issues.apache.org/jira/browse/FLINK-19710
Project: Flink
Issue Type: Improvement
Reporter: Yun Tang
Assignee: Yun Tang
Fix For: 1.12.0
We planed to bump base rocksDB version from 5.17.2 to 6.11.x. However, we
observed performance regression compared with 5.17.2 and 5.18.3 via our own
flink-benchmarks, and reported to RocksDB community in
[rocksdb#5774|https://github.com/facebook/rocksdb/issues/5774]. Since
rocksDB-5.18.3 is a bit old for RocksDB community, and rocksDB built-in
db_bench tool cannot easily reproduce this regression, we did not get any
efficient help from RocksDB community.
Since code freeze of Flink-release-1.12 is close, we have to figure it out by
ourself. We try to use rocksDB built-in db_bench tool first to binary searching
the 160 different commits between rocksDB 5.17.2 and 5.18.3. However, the
performance regression is not so clear. And after using our own
flink-benchmarks. We finally detect the commit which introduced the nearly-10%
performance regression: [replaced __thread with thread_local keyword
|https://github.com/facebook/rocksdb/commit/d6ec288703c8fc53b54be9e3e3f3ffd6a7487c63]
.
>From existing knowledge, the performance regression of {{thread-local}} is
>known from [gcc-4.8 changes|https://gcc.gnu.org/gcc-4.8/changes.html#cxx] and
>become more serious in [dynamic modules usage
>|http://david-grs.github.io/tls_performance_overhead_cost_linux/] [[tls
>benchmark|https://testbit.eu/2015/thread-local-storage-benchmark]]]. That
>could explain why rocksDB built-in db_bench tool cannot reproduce this
>regression as it is complied in static mode by recommendation.
We plan to fix this in our FRocksDB branch first to revert related changes. And
from my current local experimental result, that revert proved to be effective
to avoid that performance regression.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)