[ 
https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391853#comment-16391853
 ] 

Steve Loughran commented on HADOOP-15300:
-----------------------------------------

wasb updates every time. As does adl
{code:java}
        File System Counters
                FILE: Number of bytes read=1640418
                FILE: Number of bytes written=1636188
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                WASB: Number of bytes read=0
                WASB: Number of bytes written=915753
                WASB: Number of read operations=0
                WASB: Number of large read operations=0
                WASB: Number of write operations=0
        Map-Reduce Framework
                Map input records=96
                Map output records=0
                Input split bytes=308
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=16
                Total committed heap usage (bytes)=408944640
        File Input Format Counters 
                Bytes Read=34752
        File Output Format Counters 
                Bytes Written=16
        DistCp Counters
                Bandwidth in Btyes=12212
                Bytes Copied=461862
                Bytes Expected=461862
                Files Copied=69
                DIR_COPY=27
      118.98 real        13.33 user         1.83 sys
{code}
Updated
{code:java}
2018-03-08 15:21:44,045 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
        File System Counters
                FILE: Number of bytes read=1635633
                FILE: Number of bytes written=1630856
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                WASB: Number of bytes read=0
                WASB: Number of bytes written=910462
                WASB: Number of read operations=0
                WASB: Number of large read operations=0
                WASB: Number of write operations=0
        Map-Reduce Framework
                Map input records=96
                Map output records=0
                Input split bytes=306
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=18
                Total committed heap usage (bytes)=457179136
        File Input Format Counters 
                Bytes Read=35264
        File Output Format Counters 
                Bytes Written=16
        DistCp Counters
                Bandwidth in Btyes=10566
                Bytes Copied=461862
                Bytes Expected=461862
                Files Copied=69
                DIR_COPY=27
      129.40 real        14.55 user         2.08 sys
{code}

> distcp -update to WASB and ADL copies up all the files, always
> --------------------------------------------------------------
>
>                 Key: HADOOP-15300
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15300
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/adl, fs/azure
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the 
> source files are copied up every time. In contrast, if you use hdfs:// or 
> s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums 
> for a diff, but s3a is just returning file length and relying on distcp logic 
> being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file:  and hdfs sources, wasb and 
> adl dests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to