[
https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391851#comment-16391851
]
Steve Loughran commented on HADOOP-15300:
-----------------------------------------
distcp to s3a
{code}
2018-03-08 15:09:17,385 [main] INFO mapreduce.Job
(Job.java:monitorAndPrintJob(1658)) - Job job_local1068976850_0001 completed
successfully
2018-03-08 15:09:17,394 [main] INFO mapreduce.Job
(Job.java:monitorAndPrintJob(1665)) - Counters: 25
File System Counters
FILE: Number of bytes read=1622306
FILE: Number of bytes written=1634552
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
S3A: Number of bytes read=0
S3A: Number of bytes written=897647
S3A: Number of read operations=1688
S3A: Number of large read operations=0
S3A: Number of write operations=902
Map-Reduce Framework
Map input records=96
Map output records=0
Input split bytes=306
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=63
Total committed heap usage (bytes)=752877568
File Input Format Counters
Bytes Read=34752
File Output Format Counters
Bytes Written=16
DistCp Counters
Bandwidth in Btyes=32392
Bytes Copied=461862
Bytes Expected=461862
Files Copied=69
DIR_COPY=27
{code}
second
{code}
018-03-08 15:10:07,937 [main] INFO mapreduce.Job
(Job.java:monitorAndPrintJob(1658)) - Job job_local864019435_0001 completed
successfully
2018-03-08 15:10:07,944 [main] INFO mapreduce.Job
(Job.java:monitorAndPrintJob(1665)) - Counters: 24
File System Counters
FILE: Number of bytes read=724653
FILE: Number of bytes written=1651348
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
S3A: Number of bytes read=0
S3A: Number of bytes written=0
S3A: Number of read operations=389
S3A: Number of large read operations=0
S3A: Number of write operations=0
Map-Reduce Framework
Map input records=96
Map output records=69
Input split bytes=304
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=6
Total committed heap usage (bytes)=529530880
File Input Format Counters
Bytes Read=34752
File Output Format Counters
Bytes Written=11169
DistCp Counters
Bandwidth in Btyes=0
Bytes Skipped=461862
DIR_COPY=27
Files Skipped=69
{code}
> distcp -update to WASB and ADL copies up all the files, always
> --------------------------------------------------------------
>
> Key: HADOOP-15300
> URL: https://issues.apache.org/jira/browse/HADOOP-15300
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/adl, fs/azure
> Affects Versions: 3.1.0
> Reporter: Steve Loughran
> Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the
> source files are copied up every time. In contrast, if you use hdfs:// or
> s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums
> for a diff, but s3a is just returning file length and relying on distcp logic
> being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file: and hdfs sources, wasb and
> adl dests
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]