[
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-17288:
-----------------------------------
Labels: auto-deprioritized-major pull-request-available stale-minor (was:
auto-deprioritized-major pull-request-available)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Minor but is unassigned and neither itself nor its Sub-Tasks have been updated
for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is
still Minor, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
> Key: FLINK-17288
> URL: https://issues.apache.org/jira/browse/FLINK-17288
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: Jun Qin
> Priority: Minor
> Labels: auto-deprioritized-major, pull-request-available,
> stale-minor
>
> When resource is a constraint, loading a big savepoint into RocksDB may take
> some time. This may also impact the job recovery time when the savepoint was
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
> # using single writer thread and insert in sorted order
> # batch hundreds of keys into one write batch
> # use vector memtable
> # make sure options.max_background_flushes is at least 4
> # before inserting the data, disable automatic compaction, set
> options.level0_file_num_compaction_trigger,
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to
> your option
> If you can pre-process the data offline before inserting. There is a faster
> way: you can sort the data, generate SST files with non-overlapping ranges in
> parallel and bulkload the SST files. See
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)