[jira] [Updated] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Flink Jira Bot (Jira) Tue, 16 Nov 2021 02:44:26 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Flink Jira Bot updated FLINK-17288:
-----------------------------------
      Labels: auto-deprioritized-major auto-deprioritized-minor 
pull-request-available  (was: auto-deprioritized-major pull-request-available 
stale-minor)
    Priority: Not a Priority  (was: Minor)

This issue was labeled "stale-minor" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Minor, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
>                 Key: FLINK-17288
>                 URL: https://issues.apache.org/jira/browse/FLINK-17288
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Jun Qin
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> pull-request-available
>
> When resource is a constraint,  loading a big savepoint into RocksDB may take 
> some time. This may also impact the job recovery time when the savepoint was 
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from 
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
>  # using single writer thread and insert in sorted order
>  # batch hundreds of keys into one write batch
>  # use vector memtable
>  # make sure options.max_background_flushes is at least 4
>  # before inserting the data, disable automatic compaction, set 
> options.level0_file_num_compaction_trigger, 
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger 
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to 
> your option
> If you can pre-process the data offline before inserting. There is a faster 
> way: you can sort the data, generate SST files with non-overlapping ranges in 
> parallel and bulkload the SST files. See 
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Reply via email to