[jira] [Updated] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Flink Jira Bot (Jira) Sun, 07 Nov 2021 02:46:30 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Flink Jira Bot updated FLINK-17288:
-----------------------------------
    Labels: auto-deprioritized-major pull-request-available stale-minor  (was: 
auto-deprioritized-major pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issues has been marked as 
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated 
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is 
still Minor, please either assign yourself or give an update. Afterwards, 
please remove the label or in 7 days the issue will be deprioritized.


> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
>                 Key: FLINK-17288
>                 URL: https://issues.apache.org/jira/browse/FLINK-17288
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Jun Qin
>            Priority: Minor
>              Labels: auto-deprioritized-major, pull-request-available, 
> stale-minor
>
> When resource is a constraint,  loading a big savepoint into RocksDB may take 
> some time. This may also impact the job recovery time when the savepoint was 
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from 
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
>  # using single writer thread and insert in sorted order
>  # batch hundreds of keys into one write batch
>  # use vector memtable
>  # make sure options.max_background_flushes is at least 4
>  # before inserting the data, disable automatic compaction, set 
> options.level0_file_num_compaction_trigger, 
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger 
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to 
> your option
> If you can pre-process the data offline before inserting. There is a faster 
> way: you can sort the data, generate SST files with non-overlapping ranges in 
> parallel and bulkload the SST files. See 
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Reply via email to