azagrebin commented on a change in pull request #7351: [FLINK-11008][State
Backends, Checkpointing]SpeedUp upload state files using multithread
URL: https://github.com/apache/flink/pull/7351#discussion_r244010700
##########
File path:
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDbStateDataTransfer.java
##########
@@ -61,6 +74,144 @@ static void transferAllStateDataToDirectory(
downloadDataForAllStateHandles(miscFiles, dest,
restoringThreadNum, closeableRegistry);
}
+ public static void uploadStateFiles(
+ CheckpointStreamFactory checkpointStreamFactory,
+ SnapshotDirectory localBackupDirectory,
+ Set<StateHandleID> baseSstFiles,
+ int uploadingThreadNum,
+ CloseableRegistry snapshotCloseableRegistry,
+ @Nonnull ConcurrentHashMap<StateHandleID, StreamStateHandle>
sstFiles,
+ @Nonnull ConcurrentHashMap<StateHandleID, StreamStateHandle>
miscFiles) throws Exception {
+
+ Preconditions.checkState(localBackupDirectory.exists());
+
+ FileStatus[] fileStatuses = localBackupDirectory.listStatus();
+ if (fileStatuses != null) {
+ ExecutorService executorService =
createExecutorService(uploadingThreadNum);
+
+ try {
+ List<Runnable> runnables =
createUploadRunnables(
Review comment:
I think using `Callable<StreamStateHandle>` (path -> StreamStateHandle, as
it was in `uploadLocalFileToCheckpointFs`) is simpler and stateless, comparing
to using `Runnable` and implicit adding results into the concurrent map.
What if we implement `createUploadCallables` and use
`CompletableFuture.callAsync` to create `Map<StateHandleID,
CompletableFuture<StreamStateHandle>>`? Then the futures (map.values()) could
be waited for and the map could be converted into the result
`Map<StateHandleID, StreamStateHandle>`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services