[
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-17288:
-----------------------------------
Labels: auto-deprioritized-major auto-deprioritized-minor
pull-request-available (was: auto-deprioritized-major pull-request-available
stale-minor)
Priority: Not a Priority (was: Minor)
This issue was labeled "stale-minor" 7 days ago and has not received any
updates so it is being deprioritized. If this ticket is actually Minor, please
raise the priority and ask a committer to assign you the issue or revive the
public discussion.
> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
> Key: FLINK-17288
> URL: https://issues.apache.org/jira/browse/FLINK-17288
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: Jun Qin
> Priority: Not a Priority
> Labels: auto-deprioritized-major, auto-deprioritized-minor,
> pull-request-available
>
> When resource is a constraint, loading a big savepoint into RocksDB may take
> some time. This may also impact the job recovery time when the savepoint was
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
> # using single writer thread and insert in sorted order
> # batch hundreds of keys into one write batch
> # use vector memtable
> # make sure options.max_background_flushes is at least 4
> # before inserting the data, disable automatic compaction, set
> options.level0_file_num_compaction_trigger,
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to
> your option
> If you can pre-process the data offline before inserting. There is a faster
> way: you can sort the data, generate SST files with non-overlapping ranges in
> parallel and bulkload the SST files. See
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)