[
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jun Qin updated FLINK-17288:
----------------------------
Description:
When resource is a constraint, loading a big savepoint into RocksDB may take
some time. This may also impact the job recovery time when the savepoint was
used for recovery.
Bulk load from savepoint should help in this regard. Here is an excerpt from
the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
{quote}*Q: What's the fastest way to load data into RocksDB?*
A: A fast way to direct insert data to the DB:
# using single writer thread and insert in sorted order
# batch hundreds of keys into one write batch
# use vector memtable
# make sure options.max_background_flushes is at least 4
# before inserting the data, disable automatic compaction, set
options.level0_file_num_compaction_trigger,
options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger
to very large. After inserting all the data, issue a manual compaction.
3-5 will be automatically done if you call Options::PrepareForBulkLoad() to
your option
If you can pre-process the data offline before inserting. There is a faster
way: you can sort the data, generate SST files with non-overlapping ranges in
parallel and bulkload the SST files. See
[https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
{quote}
was:
When resource is a constraint, loading a big savepoint into RocksDB may take
some time. This may also impact the job recovery time when the savepoint was
used for recovery.
Bulk load from savepoint should help in this regard. Here is an excerpt from
the RocksDB FAQ:
{quote}*Q: What's the fastest way to load data into RocksDB?*
A: A fast way to direct insert data to the DB:
# using single writer thread and insert in sorted order
# batch hundreds of keys into one write batch
# use vector memtable
# make sure options.max_background_flushes is at least 4
# before inserting the data, disable automatic compaction, set
options.level0_file_num_compaction_trigger,
options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger
to very large. After inserting all the data, issue a manual compaction.
3-5 will be automatically done if you call Options::PrepareForBulkLoad() to
your option
If you can pre-process the data offline before inserting. There is a faster
way: you can sort the data, generate SST files with non-overlapping ranges in
parallel and bulkload the SST files. See
[https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
{quote}
> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
> Key: FLINK-17288
> URL: https://issues.apache.org/jira/browse/FLINK-17288
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Reporter: Jun Qin
> Priority: Major
>
> When resource is a constraint, loading a big savepoint into RocksDB may take
> some time. This may also impact the job recovery time when the savepoint was
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
> # using single writer thread and insert in sorted order
> # batch hundreds of keys into one write batch
> # use vector memtable
> # make sure options.max_background_flushes is at least 4
> # before inserting the data, disable automatic compaction, set
> options.level0_file_num_compaction_trigger,
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to
> your option
> If you can pre-process the data offline before inserting. There is a faster
> way: you can sort the data, generate SST files with non-overlapping ranges in
> parallel and bulkload the SST files. See
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)