StefanRRichter commented on a change in pull request #7351: [FLINK-11008][State 
Backends, Checkpointing]SpeedUp upload state files using multithread
URL: https://github.com/apache/flink/pull/7351#discussion_r248267921
 
 

 ##########
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDbStateDataTransfer.java
 ##########
 @@ -61,6 +80,116 @@ static void transferAllStateDataToDirectory(
                downloadDataForAllStateHandles(miscFiles, dest, 
restoringThreadNum, closeableRegistry);
        }
 
+       /**
+        * Upload all the files to checkpoint fileSystem using specified number 
of threads.
+        *
+        * @param files The files will be uploaded to checkpoint filesystem.
+        * @param numberOfSnapshottingThreads The number of threads used to 
upload the files.
+        * @param checkpointStreamFactory The checkpoint streamFactory used to 
create outputstream.
+        * @param closeableRegistry
+        *
+        * @throws Exception Thrown if can not upload all the files.
+        */
+       public static Map<StateHandleID, StreamStateHandle> 
uploadFilesToCheckpointFs(
+               @Nonnull Map<StateHandleID, Path> files,
+               int numberOfSnapshottingThreads,
+               CheckpointStreamFactory checkpointStreamFactory,
+               CloseableRegistry closeableRegistry) throws Exception {
+
+               Map<StateHandleID, StreamStateHandle> handles = new HashMap<>();
+
+               ExecutorService executorService = 
createExecutorService(numberOfSnapshottingThreads);
 
 Review comment:
   This line makes me wonder if it is a good idea that this is just a static 
utils class instead of a proper class. One advantage of executor services is 
that they don't need to recreate threads all the time. I agree that this method 
is not called in any hot loops, so the overhead factor might be not dramatic, 
but still I wonder why not to make this a proper class that creates a 
threadpool once and only shuts it down at the end of the object lifecycle. Any 
reason why you chose this way?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to