Hi All, I would like to fix this issue: https://issues.apache.org/jira/browse/BEAM-206
Could you please revise my design proposal? I would copy and optionally remove the temporary files one by one as an atomic operation rather then copying all of the temporary files and then removing them (if we need to remove). It has the following benefits: * If the move operation supported by the file system and the file retention is remove, we can use the native file move operation (or rename). Could be significantly faster than the copy and remove. * By moving the remove operation close to the copy operation, the probability is lower to copy the file again because of any failure (if one file of two is moved but the other one failed, when we replay, it moves only the one that failed rather than starting from scratch) Regarding the concurrency, I would use an ExecutorService to run the aforementioned operation simultaneously. The first exception would stop (interrupt) all operation. The level of the concurrency (number of threads) would be file system specific and configurable. I can imagine 10+ threads gives a good performance on GCS but gives bad performance on local file system. Best regards, Roland Harangozo
