Gil Vernik created MAPREDUCE-6854:
-------------------------------------

             Summary: Each map task should create a unique temporary name that 
includes object name
                 Key: MAPREDUCE-6854
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6854
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: distcp
            Reporter: Gil Vernik


Consider an example: a local file "/data/a.txt"  need to be copied into 
swift://container.service/data/a.txt

The way distcp works is that first it will upload "/data/a.txt" into 
swift://container.mil01/data3/.distcp.tmp.attempt_local2036034928_0001_m_000000_0

Upon completion distcp will move   
swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0
 into swift://container.mil01/data/a.txt

The temporary file naming convention assumes that each map task will 
sequentially create objects as swift://container.mil01/.distcp.tmp.attempt_ID
and then rename them to the final names.  Such flow is problematic in the 
object stores, where it usually advised not to create, delete and create object 
under the same name. 

This JIRA propose to add a configuration key indicating that temporary objects 
will also include object name as part of their temporary file name,

For example
"/data/a.txt" will be uploaded into 
"swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"/a.txt"
 or 
"swift://container.mil01/data/a.txt/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to