[ 
https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391851#comment-16391851
 ] 

Steve Loughran commented on HADOOP-15300:
-----------------------------------------

distcp to s3a

{code}
2018-03-08 15:09:17,385 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1658)) - Job job_local1068976850_0001 completed 
successfully
2018-03-08 15:09:17,394 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
        File System Counters
                FILE: Number of bytes read=1622306
                FILE: Number of bytes written=1634552
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                S3A: Number of bytes read=0
                S3A: Number of bytes written=897647
                S3A: Number of read operations=1688
                S3A: Number of large read operations=0
                S3A: Number of write operations=902
        Map-Reduce Framework
                Map input records=96
                Map output records=0
                Input split bytes=306
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=63
                Total committed heap usage (bytes)=752877568
        File Input Format Counters 
                Bytes Read=34752
        File Output Format Counters 
                Bytes Written=16
        DistCp Counters
                Bandwidth in Btyes=32392
                Bytes Copied=461862
                Bytes Expected=461862
                Files Copied=69
                DIR_COPY=27
{code}

second
{code}
018-03-08 15:10:07,937 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1658)) - Job job_local864019435_0001 completed 
successfully
2018-03-08 15:10:07,944 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 24
        File System Counters
                FILE: Number of bytes read=724653
                FILE: Number of bytes written=1651348
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                S3A: Number of bytes read=0
                S3A: Number of bytes written=0
                S3A: Number of read operations=389
                S3A: Number of large read operations=0
                S3A: Number of write operations=0
        Map-Reduce Framework
                Map input records=96
                Map output records=69
                Input split bytes=304
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=6
                Total committed heap usage (bytes)=529530880
        File Input Format Counters 
                Bytes Read=34752
        File Output Format Counters 
                Bytes Written=11169
        DistCp Counters
                Bandwidth in Btyes=0
                Bytes Skipped=461862
                DIR_COPY=27
                Files Skipped=69
{code}

> distcp -update to WASB and ADL copies up all the files, always
> --------------------------------------------------------------
>
>                 Key: HADOOP-15300
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15300
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/adl, fs/azure
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the 
> source files are copied up every time. In contrast, if you use hdfs:// or 
> s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums 
> for a diff, but s3a is just returning file length and relying on distcp logic 
> being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file:  and hdfs sources, wasb and 
> adl dests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to