For the details of what causes this regression, I would add @Yun Tang <myas...@live.com> to this discussion.
On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov <yuva...@gmail.com> wrote: > We are heavy users of RocksDB and have had several issues with memory > managed in Kubernetes, most of them actually went away when we upgraded > from Flink 1.9 to 1.13. > > Do we know why there's such a huge performance regression? Can we improve > this somehow with some flag tweaking? It would be great if we see a more in > depth explanation of the gains vs losses of upgrading. > > On Wed, Aug 4, 2021 at 3:08 PM Stephan Ewen <se...@apache.org> wrote: > >> Hi all! >> >> *!!! If you are a big user of the Embedded RocksDB State Backend and >> have performance sensitive workloads, please read this !!!* >> >> I want to quickly raise some awareness for a RocksDB version upgrade we >> plan to do, and some possible impact on application performance. >> >> *We plan to upgrade RocksDB to version 6.20.* That version of RocksDB >> unfortunately introduces some non-trivial performance regression. In our >> Nexmark Benchmark, at least one query is up to 13% slower. >> With some fixes, this can be improved, but even then there is an overall >> *regression >> up to 6% in some queries*. (See attached table for results from relevant >> Nexmark Benchmark queries). >> >> We would do this update nonetheless, because we need to get new features >> and bugfixes from RocksDB in. >> >> Please respond to this mail thread if you have major concerns about this. >> >> >> *### Fallback Plan* >> >> Optionally, we could fall back to Plan B, which is to upgrade RocksDB >> only to version 5.18.4. >> Which has no performance regression (after applying a custom patch). >> >> While this spares us the performance degradation of RocksDB 6.20.x, this >> has multiple disadvantages: >> - Does not include the better memory stability (strict cache control) >> - Misses out on some new features which some users asked about >> - Does not have the latest RocksDB bugfixes >> >> The latest point is especially bad in my opinion. While we can >> cherry-pick some bugfixes back (and have done this in the past), users >> typically run into an issue first and need to trace it back to RocksDB, >> then one of the committers can find the relevant patch from RocksDB master >> and backport it. That isn't the greatest user experience. >> >> Because of those disadvantages, we would prefer to do the upgrade to the >> newer RocksDB version despite the unfortunate performance regression. >> >> Best, >> Stephan >> >> >> > > -- > Best Regards, > Yuval Itzchakov. >