+Daniel Halperin <[email protected]> On Thu, May 12, 2016 at 10:20 AM Roland Harangozo <[email protected]> wrote:
> Hi All, > > I would like to fix this issue: > https://issues.apache.org/jira/browse/BEAM-206 > > Could you please revise my design proposal? > > I would copy and optionally remove the temporary files one by one as an > atomic operation rather then copying all of the temporary files and then > removing them (if we need to remove). It has the following benefits: > * If the move operation supported by the file system and the file retention > is remove, we can use the native file move operation (or rename). Could be > significantly faster than the copy and remove. > * By moving the remove operation close to the copy operation, the > probability is lower to copy the file again because of any failure (if one > file of two is moved but the other one failed, when we replay, it moves > only the one that failed rather than starting from scratch) > > Regarding the concurrency, I would use an ExecutorService to run the > aforementioned operation simultaneously. The first exception would stop > (interrupt) all operation. > > The level of the concurrency (number of threads) would be file system > specific and configurable. I can imagine 10+ threads gives a good > performance on GCS but gives bad performance on local file system. > > Best regards, > Roland Harangozo >
