Use java.io.tmpdir as default output location for BulkRecordWriter
------------------------------------------------------------------

                 Key: CASSANDRA-3840
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3840
             Project: Cassandra
          Issue Type: Improvement
          Components: Hadoop
    Affects Versions: 1.1
            Reporter: Erik Forsberg


BulkRecordWriter uses the value of the property 
mapreduce.output.bulkoutputformat.localdir if set, defaulting to value of 
mapred.local.dir if the former is not set.

However, on a typical production system, mapred.local.dir is set to a list of 
directories. This leads to BulkOutputFormat writing to silly paths such as

/dir1/,dir2,/dir3,KeySpaceName/CFName

This has two effects:

1) Directory is not removed when job is finished, leading to disk space 
management issues.

2) If a new job is run against same keyspacename and CF, it tries to load old 
data + new data.

Better to use System.getProperty("java.io.tmpdir"), as that is set to an 
attempt-specific temporary directory which is cleaned after the job finishes. 
See http://hadoop.apache.org/common/docs/current/mapred_tutorial.html, under 
"Directory Structure".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to