[
https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391853#comment-16391853
]
Steve Loughran commented on HADOOP-15300:
-----------------------------------------
wasb updates every time. As does adl
{code:java}
File System Counters
FILE: Number of bytes read=1640418
FILE: Number of bytes written=1636188
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
WASB: Number of bytes read=0
WASB: Number of bytes written=915753
WASB: Number of read operations=0
WASB: Number of large read operations=0
WASB: Number of write operations=0
Map-Reduce Framework
Map input records=96
Map output records=0
Input split bytes=308
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=16
Total committed heap usage (bytes)=408944640
File Input Format Counters
Bytes Read=34752
File Output Format Counters
Bytes Written=16
DistCp Counters
Bandwidth in Btyes=12212
Bytes Copied=461862
Bytes Expected=461862
Files Copied=69
DIR_COPY=27
118.98 real 13.33 user 1.83 sys
{code}
Updated
{code:java}
2018-03-08 15:21:44,045 [main] INFO mapreduce.Job
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
File System Counters
FILE: Number of bytes read=1635633
FILE: Number of bytes written=1630856
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
WASB: Number of bytes read=0
WASB: Number of bytes written=910462
WASB: Number of read operations=0
WASB: Number of large read operations=0
WASB: Number of write operations=0
Map-Reduce Framework
Map input records=96
Map output records=0
Input split bytes=306
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=18
Total committed heap usage (bytes)=457179136
File Input Format Counters
Bytes Read=35264
File Output Format Counters
Bytes Written=16
DistCp Counters
Bandwidth in Btyes=10566
Bytes Copied=461862
Bytes Expected=461862
Files Copied=69
DIR_COPY=27
129.40 real 14.55 user 2.08 sys
{code}
> distcp -update to WASB and ADL copies up all the files, always
> --------------------------------------------------------------
>
> Key: HADOOP-15300
> URL: https://issues.apache.org/jira/browse/HADOOP-15300
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/adl, fs/azure
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the
> source files are copied up every time. In contrast, if you use hdfs:// or
> s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums
> for a diff, but s3a is just returning file length and relying on distcp logic
> being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file: and hdfs sources, wasb and
> adl dests
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]