[ https://issues.apache.org/jira/browse/HADOOP-18564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siyao Meng updated HADOOP-18564: -------------------------------- Description: h2. Goal Reduce user friction h2. Background When distcp'ing between two different file systems, distcp still uses block-level checksum by default, even though the two file systems can be very different in how they manage blocks, so that a block-level checksum no longer makes sense between these two. e.g. distcp between HDFS and Ozone without overriding {{dfs.checksum.combine.mode}} throws IOException because the blocks of the same file on two FSes are different (as expected): {code} $ hadoop distcp -i -pp /test o3fs://buck-test1.vol1.ozone1/ java.lang.Exception: java.io.IOException: File copy failed: hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin --> o3fs://buck-test1.vol1.ozone1/test/test.bin at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552) Caused by: java.io.IOException: File copy failed: hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin --> o3fs://buck-test1.vol1.ozone1/test/test.bin at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:219) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin to o3fs://buck-test1.vol1.ozone1/test/test.bin at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) ... 11 more Caused by: java.io.IOException: Checksum mismatch between hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin and o3fs://buck-test1.vol1.ozone1/.distcp.tmp.attempt_local1346550241_0001_m_000000_0.Source and destination filesystems are of different types Their checksum algorithms may be incompatible You can choose file-level checksum validation via -Ddfs.checksum.combine.mode=COMPOSITE_CRC when block-sizes or filesystems are different. Or you can skip checksum-checks altogether with -skipcrccheck. {code} And it works when we use a file-level checksum like {{COMPOSITE_CRC}}: {code:title=With -Ddfs.checksum.combine.mode=COMPOSITE_CRC} $ hadoop distcp -i -pp /test o3fs://buck-test2.vol1.ozone1/ -Ddfs.checksum.combine.mode=COMPOSITE_CRC 22/10/18 19:07:42 INFO mapreduce.Job: Job job_local386071499_0001 completed successfully 22/10/18 19:07:42 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=219900 FILE: Number of bytes written=794129 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=0 HDFS: Number of bytes written=0 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 O3FS: Number of bytes read=0 O3FS: Number of bytes written=0 O3FS: Number of read operations=5 O3FS: Number of large read operations=0 O3FS: Number of write operations=0 .. {code} h2. Alternative (if changing global defaults could potentially break distcp'ing between HDFS/S3/etc. Also [~weichiu] mentioned COMPOSITE_CRC is only added in Hadoop 3.1.1. So this might be the only way.) Don't touch the global default, and make it a client-side config. e.g. add a config to allow automatically usage of COMPOSITE_CRC (dfs.checksum.combine.mode) when distcp'ing between HDFS and Ozone, which would be the equivalent of specifying {{-Ddfs.checksum.combine.mode=COMPOSITE_CRC}} on the distcp command but the end user won't have to specify it every single time. cc [~duongnguyen] [~weichiu] was: h2. Goal Reduce user friction h2. Background When distcp'ing between two different file systems, distcp still uses block-level checksum by default, even though the two file systems can be very different in how they manage blocks, so that a block-level checksum no longer makes sense between these two. e.g. distcp between HDFS and Ozone without overriding {{dfs.checksum.combine.mode}} throws IOException because the blocks of the same file on two FSes are different (as expected): {code} $ hadoop distcp -i -pp /test o3fs://buck-test1.vol1.ozone1/ java.lang.Exception: java.io.IOException: File copy failed: hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin --> o3fs://buck-test1.vol1.ozone1/test/test.bin at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552) Caused by: java.io.IOException: File copy failed: hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin --> o3fs://buck-test1.vol1.ozone1/test/test.bin at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:219) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin to o3fs://buck-test1.vol1.ozone1/test/test.bin at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) ... 11 more Caused by: java.io.IOException: Checksum mismatch between hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin and o3fs://buck-test1.vol1.ozone1/.distcp.tmp.attempt_local1346550241_0001_m_000000_0.Source and destination filesystems are of different types Their checksum algorithms may be incompatible You can choose file-level checksum validation via -Ddfs.checksum.combine.mode=COMPOSITE_CRC when block-sizes or filesystems are different. Or you can skip checksum-checks altogether with -skipcrccheck. {code} And it works when we use a file-level checksum like {{COMPOSITE_CRC}}: {code:title=With -Ddfs.checksum.combine.mode=COMPOSITE_CRC} $ hadoop distcp -i -pp /test o3fs://buck-test2.vol1.ozone1/ -Ddfs.checksum.combine.mode=COMPOSITE_CRC 22/10/18 19:07:42 INFO mapreduce.Job: Job job_local386071499_0001 completed successfully 22/10/18 19:07:42 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=219900 FILE: Number of bytes written=794129 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=0 HDFS: Number of bytes written=0 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 O3FS: Number of bytes read=0 O3FS: Number of bytes written=0 O3FS: Number of read operations=5 O3FS: Number of large read operations=0 O3FS: Number of write operations=0 .. {code} h2. Alternative (if changing global defaults could potentially break distcp'ing between HDFS/S3/etc.) Don't touch the global default, and make it a client-side config. e.g. add a config to allow automatically usage of COMPOSITE_CRC (dfs.checksum.combine.mode) when distcp'ing between HDFS and Ozone, which would be the equivalent of specifying {{-Ddfs.checksum.combine.mode=COMPOSITE_CRC}} on the distcp command but the end user won't have to specify it every single time. cc [~duongnguyen] [~weichiu] > Use file-level checksum by default when copying between two different file > systems > ---------------------------------------------------------------------------------- > > Key: HADOOP-18564 > URL: https://issues.apache.org/jira/browse/HADOOP-18564 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Siyao Meng > Priority: Major > Labels: DistCp > > h2. Goal > Reduce user friction > h2. Background > When distcp'ing between two different file systems, distcp still uses > block-level checksum by default, even though the two file systems can be very > different in how they manage blocks, so that a block-level checksum no longer > makes sense between these two. > e.g. distcp between HDFS and Ozone without overriding > {{dfs.checksum.combine.mode}} throws IOException because the blocks of the > same file on two FSes are different (as expected): > {code} > $ hadoop distcp -i -pp /test o3fs://buck-test1.vol1.ozone1/ > java.lang.Exception: java.io.IOException: File copy failed: > hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin --> > o3fs://buck-test1.vol1.ozone1/test/test.bin > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552) > Caused by: java.io.IOException: File copy failed: > hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin --> > o3fs://buck-test1.vol1.ozone1/test/test.bin > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:219) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Couldn't run retriable-command: Copying > hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin to > o3fs://buck-test1.vol1.ozone1/test/test.bin > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) > ... 11 more > Caused by: java.io.IOException: Checksum mismatch between > hdfs://duong-1.duong.root.hwx.site:8020/test/test.bin and > o3fs://buck-test1.vol1.ozone1/.distcp.tmp.attempt_local1346550241_0001_m_000000_0.Source > and destination filesystems are of different types > Their checksum algorithms may be incompatible You can choose file-level > checksum validation via -Ddfs.checksum.combine.mode=COMPOSITE_CRC when > block-sizes or filesystems are different. Or you can skip checksum-checks > altogether with -skipcrccheck. > {code} > And it works when we use a file-level checksum like {{COMPOSITE_CRC}}: > {code:title=With -Ddfs.checksum.combine.mode=COMPOSITE_CRC} > $ hadoop distcp -i -pp /test o3fs://buck-test2.vol1.ozone1/ > -Ddfs.checksum.combine.mode=COMPOSITE_CRC > 22/10/18 19:07:42 INFO mapreduce.Job: Job job_local386071499_0001 completed > successfully > 22/10/18 19:07:42 INFO mapreduce.Job: Counters: 30 > File System Counters > FILE: Number of bytes read=219900 > FILE: Number of bytes written=794129 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=0 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=13 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=2 > HDFS: Number of bytes read erasure-coded=0 > O3FS: Number of bytes read=0 > O3FS: Number of bytes written=0 > O3FS: Number of read operations=5 > O3FS: Number of large read operations=0 > O3FS: Number of write operations=0 > .. > {code} > h2. Alternative > (if changing global defaults could potentially break distcp'ing between > HDFS/S3/etc. Also [~weichiu] mentioned COMPOSITE_CRC is only added in Hadoop > 3.1.1. So this might be the only way.) > Don't touch the global default, and make it a client-side config. > e.g. add a config to allow automatically usage of COMPOSITE_CRC > (dfs.checksum.combine.mode) when distcp'ing between HDFS and Ozone, which > would be the equivalent of specifying > {{-Ddfs.checksum.combine.mode=COMPOSITE_CRC}} on the distcp command but the > end user won't have to specify it every single time. > cc [~duongnguyen] [~weichiu] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org