Bimalendu Choudhary created MAPREDUCE-7331:
----------------------------------------------

             Summary: Make temporary directory used by FileOutputCommitter 
configurable
                 Key: MAPREDUCE-7331
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7331
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 3.0.0
         Environment: CDH 6.2.1 Hadoop 3.0.0
            Reporter: Bimalendu Choudhary


Spark SQL applications uses FileOutputCommitter to commit and merge its files 
under a table directory. The hardcoded PENDING_DIR_NAME = _temporary directory 
results in multiple application using the same temporary directory. This casues 
unwanted results of one application interfering with other applications 
temporary files. Also one application ending up deleting temporary files of 
other. There is no way right now for applications to have there unique path to 
store the temporary files to avoid any interference from other totally 
independent applications.  I think the temporary directory being used by 
FileOutputCommitter should be made configurable to let the caller call with 
with its own unique value as per the requirement and avoid it getting deleted 
or overwritten by other applications 

Something like:
{quote}public static final String PENDING_DIR_NAME_DEFAULT = "_temporary";
 public static final String PENDING_DIR_NAME_DEFAULT =
 "mapreduce.fileoutputcommitter.tempdir";
{quote}
 

This can be used very efficiently by Spark applications to handle even stage 
failures where temporary directories from previous attempts cause problem and 
can help in so many situations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to