+Daniel Halperin <[email protected]>

On Thu, May 12, 2016 at 10:20 AM Roland Harangozo <[email protected]> wrote:

> Hi All,
>
> I would like to fix this issue:
> https://issues.apache.org/jira/browse/BEAM-206
>
> Could you please revise my design proposal?
>
> I would copy and optionally remove the temporary files one by one as an
> atomic operation rather then copying all of the temporary files and then
> removing them (if we need to remove). It has the following benefits:
> * If the move operation supported by the file system and the file retention
> is remove, we can use the native file move operation (or rename). Could be
> significantly faster than the copy and remove.
> * By moving the remove operation close to the copy operation, the
> probability is lower to copy the file again because of any failure (if one
> file of two is moved but the other one failed, when we replay, it moves
> only the one that failed rather than starting from scratch)
>
> Regarding the concurrency, I would use an ExecutorService to run the
> aforementioned operation simultaneously. The first exception would stop
> (interrupt) all operation.
>
> The level of the concurrency (number of threads) would be file system
> specific and configurable. I can imagine 10+ threads gives a good
> performance on GCS but gives bad performance on local file system.
>
> Best regards,
> Roland Harangozo
>

Reply via email to