[jira] [Updated] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Jun Qin (Jira) Tue, 21 Apr 2020 00:34:26 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jun Qin updated FLINK-17288:
----------------------------
    Description: 
When resource is a constraint,  loading a big savepoint into RocksDB may take 
some time. This may also impact the job recovery time when the savepoint was 
used for recovery.

Bulk load from savepoint should help in this regard. Here is an excerpt from 
the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
{quote}*Q: What's the fastest way to load data into RocksDB?*

A: A fast way to direct insert data to the DB:
 # using single writer thread and insert in sorted order
 # batch hundreds of keys into one write batch
 # use vector memtable
 # make sure options.max_background_flushes is at least 4
 # before inserting the data, disable automatic compaction, set 
options.level0_file_num_compaction_trigger, 
options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger 
to very large. After inserting all the data, issue a manual compaction.

3-5 will be automatically done if you call Options::PrepareForBulkLoad() to 
your option

If you can pre-process the data offline before inserting. There is a faster 
way: you can sort the data, generate SST files with non-overlapping ranges in 
parallel and bulkload the SST files. See 
[https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
{quote}

  was:
When resource is a constraint,  loading a big savepoint into RocksDB may take 
some time. This may also impact the job recovery time when the savepoint was 
used for recovery.

Bulk load from savepoint should help in this regard. Here is an excerpt from 
the RocksDB FAQ:
{quote}*Q: What's the fastest way to load data into RocksDB?*

A: A fast way to direct insert data to the DB:
 # using single writer thread and insert in sorted order
 # batch hundreds of keys into one write batch
 # use vector memtable
 # make sure options.max_background_flushes is at least 4
 # before inserting the data, disable automatic compaction, set 
options.level0_file_num_compaction_trigger, 
options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger 
to very large. After inserting all the data, issue a manual compaction.

3-5 will be automatically done if you call Options::PrepareForBulkLoad() to 
your option

If you can pre-process the data offline before inserting. There is a faster 
way: you can sort the data, generate SST files with non-overlapping ranges in 
parallel and bulkload the SST files. See 
[https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
{quote}


> Speedup loading from savepoints into RocksDB by bulk load
> ---------------------------------------------------------
>
>                 Key: FLINK-17288
>                 URL: https://issues.apache.org/jira/browse/FLINK-17288
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Jun Qin
>            Priority: Major
>
> When resource is a constraint,  loading a big savepoint into RocksDB may take 
> some time. This may also impact the job recovery time when the savepoint was 
> used for recovery.
> Bulk load from savepoint should help in this regard. Here is an excerpt from 
> the RocksDB FAQ (https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ):
> {quote}*Q: What's the fastest way to load data into RocksDB?*
> A: A fast way to direct insert data to the DB:
>  # using single writer thread and insert in sorted order
>  # batch hundreds of keys into one write batch
>  # use vector memtable
>  # make sure options.max_background_flushes is at least 4
>  # before inserting the data, disable automatic compaction, set 
> options.level0_file_num_compaction_trigger, 
> options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger 
> to very large. After inserting all the data, issue a manual compaction.
> 3-5 will be automatically done if you call Options::PrepareForBulkLoad() to 
> your option
> If you can pre-process the data offline before inserting. There is a faster 
> way: you can sort the data, generate SST files with non-overlapping ranges in 
> parallel and bulkload the SST files. See 
> [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Reply via email to