Github user mridulm commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35703455 Typically, the way this gets done is - write to a temporary directory, taking care of multiple attempts for same partition (failure case)/multiple concurrent executions on same partition (speculative execution case) and once job is done, move to the desired destination (or delete dir if job fails) - like what mapred does for example. (Moves are atomic NN operations). So when output directory is "done", it is fully done : not partially/in progress/etc. Particularly the bug mentioned - of left over files from previous jobs, etc - is just scarey !
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---