[ 
https://issues.apache.org/jira/browse/FLINK-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298763#comment-17298763
 ] 

Yun Tang commented on FLINK-19710:
----------------------------------

Current performance check result with flink-benchmarks, the optimized rocksdb 
only add commit "FLINK-19710 Revert implementation of PerfContext back to 
__thread  
([https://github.com/ververica/frocksdb/commit/bf229ed5cf0e0ecce4a09ef2f9184ad0382f204c])
 
|operations|master (5.17) performance|original 6.15 performance|optimized 
6.15.5 performance|
|ListStateBenchmark.listAdd          |680.403|614.161|-9.74%|643.839|-5.37%|
|ListStateBenchmark.listAddAll       |393.332|410.147|4.28%|361.208|-8.17%|
|ListStateBenchmark.listAppend       |657.895|603.058|-8.34%|622.743|-5.34%|
|ListStateBenchmark.listGet          |196.412|173.898|-11.46%|181.853|-7.41%|
|ListStateBenchmark.listGetAndIterate|197.593|174.241|-11.82%|181.54|-8.12%|
|ListStateBenchmark.listUpdate       |675.227|616.765|-8.66%|642.998|-4.77%|
|MapStateBenchmark.mapAdd            |570.215|529.624|-7.12%|543.944|-4.61%|
|MapStateBenchmark.mapContains       |72.364|67.271|-7.04%|70.228|-2.95%|
|MapStateBenchmark.mapEntries        |501.85|453.633|-9.61%|468.034|-6.74%|
|MapStateBenchmark.mapGet            |72.08|67.175|-6.80%|70.601|-2.05%|
|MapStateBenchmark.mapIsEmpty        |65.138|59.04|-9.36%|60.993|-6.36%|
|MapStateBenchmark.mapIterator       |501.34|454.393|-9.36%|470.373|-6.18%|
|MapStateBenchmark.mapKeys           |507.73|461.156|-9.17%|475.138|-6.42%|
|MapStateBenchmark.mapPutAll         |167.464|164.098|-2.01%|169.962|1.49%|
|MapStateBenchmark.mapRemove         |583.622|538|-7.82%|550.319|-5.71%|
|MapStateBenchmark.mapUpdate         |568.775|526.051|-7.51%|544.789|-4.22%|
|MapStateBenchmark.mapValues         |502.498|454.133|-9.62%|469.765|-6.51%|
|ValueStateBenchmark.valueAdd        |577.934|527.072|-8.80%|551.323|-4.60%|
|ValueStateBenchmark.valueGet        |936.219|826.148|-11.76%|914.24|-2.35%|
|ValueStateBenchmark.valueUpdate     |582.02|526.942|-9.46%|554.962|-4.65%|

 

As we can see, there still existed about 2%~8% performance regression. 
Unfortunately, the commits between 5.18 with 6.15 are too many to find the root 
cause which leads performance regression, I can see performance up and down 
instead of suddendly down after some specific commit.  
[https://github.com/facebook/rocksdb/pull/5797] might be one but is the correct 
way to calculate correct memory usage. 

And I am trying to adopt {{ByteBuffer to help improve performance regression.}}

 

{{ }}

> Fix performance regression to rebase FRocksDB with higher version RocksDB
> -------------------------------------------------------------------------
>
>                 Key: FLINK-19710
>                 URL: https://issues.apache.org/jira/browse/FLINK-19710
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Yun Tang
>            Assignee: Yun Tang
>            Priority: Major
>             Fix For: 1.13.0
>
>
> We planed to bump base rocksDB version from 5.17.2 to 6.11.x. However, we 
> observed performance regression compared with 5.17.2 and 5.18.3 via our own 
> flink-benchmarks, and reported to RocksDB community in 
> [rocksdb#5774|https://github.com/facebook/rocksdb/issues/5774]. Since 
> rocksDB-5.18.3 is a bit old for RocksDB community, and rocksDB built-in 
> db_bench tool cannot easily reproduce this regression, we did not get any 
> efficient help from RocksDB community.
> Since code freeze of Flink-release-1.12 is close, we have to figure it out by 
> ourself. We try to use rocksDB built-in db_bench tool first to binary 
> searching the 160 different commits between rocksDB 5.17.2 and 5.18.3. 
> However, the performance regression is not so clear. And after using our own 
> flink-benchmarks. We finally detect the commit which introduced the 
> nearly-10% performance regression: [replaced __thread with thread_local 
> keyword 
> |https://github.com/facebook/rocksdb/commit/d6ec288703c8fc53b54be9e3e3f3ffd6a7487c63]
>  .
> From existing knowledge, the performance regression of {{thread-local}} is 
> known from [gcc-4.8 changes|https://gcc.gnu.org/gcc-4.8/changes.html#cxx] and 
> become more serious in [dynamic modules usage 
> |http://david-grs.github.io/tls_performance_overhead_cost_linux/] [[tls 
> benchmark|https://testbit.eu/2015/thread-local-storage-benchmark]]]. That 
> could explain why rocksDB built-in db_bench tool cannot reproduce this 
> regression as it is complied in static mode by recommendation.
>  
> We plan to fix this in our FRocksDB branch first to revert related changes. 
> And from my current local experimental result, that revert proved to be 
> effective to avoid that performance regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to