[ 
https://issues.apache.org/jira/browse/FLINK-17971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280737#comment-17280737
 ] 

Joey Pereira commented on FLINK-17971:
--------------------------------------

>From the linked PR:

 

---

Sorry about having dropped the progress on these changes (life got the better 
of me for quite a while). I did not end up getting a good production benchmark 
for the changes here.

I recently had the time to rebase these changes to the latest master. I also 
carved out the change to use {{deleteRange}} into a separate 
[PR|https://github.com/apache/flink/pull/14893] and an early benchmark for that.

While I was doing a pass on that PR, I stumbled on an earlier ticket and 
discussion where @sihuazhou previously investigated using the external ingest 
API to speed these paths up. This is documented in FLINK-8845.

Summarizing the details from there, sihuazhou's initial work and investigation 
found that the RocksDB's Java API for the SST writer had some key performance 
issues. Specifically, the interface was limited to {{put(byte[] key, byte[] 
value)}} and internally copied memory for constructing the RocksDB 
{{DirectSlice}}. This added a non-trivial overhead causing the Java 
SstFileWriter to have poor performance. As a result, they implemented the 
{{RocksDBWriteBatchWrapper}} for bulk writes rather than SST file ingestion.

I found the RocksDB issue with a detailed write-up outlining this problem with 
{{SstFileWriter}} performance: https://github.com/facebook/rocksdb/issues/2668.

Now, since then there was a PR, 
[https://github.com/facebook/rocksdb/pull/2283], made to address this issue but 
it was hanging open from 2017. It had only just gotten merged in Feb 2020! This 
change was released as part of the Java API in RocksDB 6.8.0, see the 
[6.8.0|https://github.com/facebook/rocksdb/blob/master/HISTORY.md#680-02242020].

Provided sihuazhou's earlier investigation, I suspect this branch may not have 
much of an improvement without upgrading RocksDB to 6.8.0. Given the state of 
the RocksDB upgrade in FLINK-14482, I suspect it'll be quite some time (& work) 
before we get to there.

> Speed up RocksDB bulk loading with SST generation and ingestion
> ---------------------------------------------------------------
>
>                 Key: FLINK-17971
>                 URL: https://issues.apache.org/jira/browse/FLINK-17971
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Joey Pereira
>            Priority: Major
>              Labels: pull-request-available
>
> RocksDB provides an API for creating SST files and ingesting them directly 
> into RocksDB: 
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> Using this method for bulk loading data into RocksDB may provide a 
> significant performance increase, specifically for paths doing inserts such 
> as full savepoint recovery and state migrations. This is one method of 
> optimizing bulk loads, as described in 
> https://issues.apache.org/jira/browse/FLINK-17288
> This was discussed on the user maillist: 
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/RocksDB-savepoint-recovery-performance-improvements-td35238.html]
> A draft PR is here: [https://github.com/apache/flink/pull/12345/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to