Hi All,

I would like to fix this issue:
https://issues.apache.org/jira/browse/BEAM-206

Could you please revise my design proposal?

I would copy and optionally remove the temporary files one by one as an
atomic operation rather then copying all of the temporary files and then
removing them (if we need to remove). It has the following benefits:
* If the move operation supported by the file system and the file retention
is remove, we can use the native file move operation (or rename). Could be
significantly faster than the copy and remove.
* By moving the remove operation close to the copy operation, the
probability is lower to copy the file again because of any failure (if one
file of two is moved but the other one failed, when we replay, it moves
only the one that failed rather than starting from scratch)

Regarding the concurrency, I would use an ExecutorService to run the
aforementioned operation simultaneously. The first exception would stop
(interrupt) all operation.

The level of the concurrency (number of threads) would be file system
specific and configurable. I can imagine 10+ threads gives a good
performance on GCS but gives bad performance on local file system.

Best regards,
Roland Harangozo

Reply via email to