[ 
https://issues.apache.org/jira/browse/HADOOP-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18582:
------------------------------------
    Description: 
it not necessary to do `cleanupTempFiles`  while ditcp commit job in direct  
mode, because it there is no temp files in direct mode.

This clean operation will increase the task execution time, because it will get 
the list of files in the target path. When the number of files in the target 
path is very large, this operation will be very slow.

*note* there are two patches which need to be cherrypicked when picking this 
up; the original patch and a followup, both with HADOOP-18582 in the title


{code}
3b7b79b37ae HADOOP-18582. skip unnecessary cleanup logic in distcp (#5251)
e8a6b2c2c4e HADOOP-18582. Addendum: Skip unnecessary cleanup logic in DistCp.
{code}


  was:
it not necessary to do `cleanupTempFiles`  while ditcp commit job in direct  
mode, because it there is no temp files in direct mode.

This clean operation will increase the task execution time, because it will get 
the list of files in the target path. When the number of files in the target 
path is very large, this operation will be very slow.


> No need to clean tmp files in distcp direct mode
> ------------------------------------------------
>
>                 Key: HADOOP-18582
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18582
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 3.3.4
>            Reporter: 10000kang
>            Assignee: 10000kang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.9
>
>
> it not necessary to do `cleanupTempFiles`  while ditcp commit job in direct  
> mode, because it there is no temp files in direct mode.
> This clean operation will increase the task execution time, because it will 
> get the list of files in the target path. When the number of files in the 
> target path is very large, this operation will be very slow.
> *note* there are two patches which need to be cherrypicked when picking this 
> up; the original patch and a followup, both with HADOOP-18582 in the title
> {code}
> 3b7b79b37ae HADOOP-18582. skip unnecessary cleanup logic in distcp (#5251)
> e8a6b2c2c4e HADOOP-18582. Addendum: Skip unnecessary cleanup logic in DistCp.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to