[ 
https://issues.apache.org/jira/browse/FLINK-29577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620889#comment-17620889
 ] 

Cai Liuyang commented on FLINK-29577:
-------------------------------------

[~ym] 

This is my test case:

1. disable rocksdb managed memory
2. use rocksdb full snapshot strategy
3. KeyProcessor have one subtask, five state(state1, state2, state3, state4, 
state5), total state size is 2GB
4. the size of state1 is small, the total size of state1 is less than rocksdb 
default memtable size(64M), such as state1 just have one record

after restoring, rocksdb wal can not be gc(see the png1), the reason i guess is 
state1's memtable doesn't flushed, and only by gced until state1's memtable be 
flushed (Rocksdb Full snapshot only take a snapshot).

During my test, I found if state1 ~ state4 size is big but state5 is small(only 
have one record), then there only be one wal file(see png2)


In my test, disable wal doesn't imporve restore speed, two case(disable / 
enable wal during restore) is almost the same.

 

!image-2022-10-20-16-08-15-746.png!!image-2022-10-20-16-08-54-359.png!

> Disable rocksdb wal when restore from full snapshot
> ---------------------------------------------------
>
>                 Key: FLINK-29577
>                 URL: https://issues.apache.org/jira/browse/FLINK-29577
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / State Backends
>            Reporter: Cai Liuyang
>            Assignee: Cai Liuyang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-10-20-16-08-15-746.png
>
>
> For now, RocksDBFullRestoreOperation and 
> RocksDBHeapTimersFullRestoreOperation does's pass RocksDB::WriteOptions to 
> RocksDBWriteBatchWrapper when restore kv-data, which will use 
> RocksDBWriteBatchWrapper‘s default WriteOptions(doesn't disable rocksdb wal 
> explicitly, see code below), so during restoring from full snapshot, wal is 
> enabled(use more disk and maybe affect rocksdb-write-performance when 
> restoring)
>  
> {code:java}
> // First: RocksDBHeapTimersFullRestoreOperation::restoreKVStateData() doesn't 
> pass WriteOptions to RocksDBWriteBatchWrapper(null as default)
> private void restoreKVStateData(
>         ThrowingIterator<KeyGroup> keyGroups,
>         Map<Integer, ColumnFamilyHandle> columnFamilies,
>         Map<Integer, HeapPriorityQueueSnapshotRestoreWrapper<?>> 
> restoredPQStates)
>         throws IOException, RocksDBException, StateMigrationException {
>     // for all key-groups in the current state handle...
>     try (RocksDBWriteBatchWrapper writeBatchWrapper =
>             new RocksDBWriteBatchWrapper(this.rocksHandle.getDb(), 
> writeBatchSize)) {
>         HeapPriorityQueueSnapshotRestoreWrapper<HeapPriorityQueueElement> 
> restoredPQ = null;
>         ColumnFamilyHandle handle = null;
>    ......
> }
> // Second: RocksDBWriteBatchWrapper::flush function doesn't disable wal 
> explicitly when user doesn't pass WriteOptions to RocksDBWriteBatchWrapper
> public void flush() throws RocksDBException {
>     if (options != null) {
>         db.write(options, batch);
>     } else {
>         // use the default WriteOptions, if wasn't provided.
>         try (WriteOptions writeOptions = new WriteOptions()) {
>             db.write(writeOptions, batch);
>         }
>     }
>     batch.clear();
> }
> {code}
>  
>  
> As we known, rocksdb's wal is usesless for flink, so i think we can disable 
> wal for RocksDBWriteBatchWrapper's default WriteOptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to