[ 
https://issues.apache.org/jira/browse/HADOOP-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073938#comment-17073938
 ] 

Mithun Radhakrishnan commented on HADOOP-8143:
----------------------------------------------

Thank you for pointing to HDFS-13056. This should address the crux of the 
problem, i.e. decoupling checksums from block-size. I have only perused it 
briefly, but the following section from the HDFS-13056's description is 
promising:
{quote}This option can be enabled or disabled at the granularity of individual 
client calls by setting the new configuration option 
`dfs.checksum.combine.mode` to `COMPOSITE_CRC`
{quote}
It appears that this doesn't require opt-in on HDFS/Name-node, and that 
querying for a file's checksum with 
{{`dfs.checksum.combine.mode=COMPOSITE_CRC`}} should return a CRC independent 
of block-size.

If this holds, perhaps DistCp should be changed to fetch CRCs thus, freeing us 
of requiring to preserve block-size for the sake of correctness. (It'll only 
hold on Hadoop 3.1.1+.)
{quote}At the very least, we need a way to turn this new default off. 
Especially when -skipCrcCheck is true.
{quote}
I'm a little rusty, but it surprises me that block-size preservation isn't 
turned off when {{`-skipCrcCheck && (!-pb)`}}. If this isn't so, then that's an 
oversight and needs fixing. As a workaround, specifying `-pu`, for instance, 
should disable block-size preservation.

> Change distcp to have -pb on by default
> ---------------------------------------
>
>                 Key: HADOOP-8143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8143
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Dave Thompson
>            Assignee: Mithun Radhakrishnan
>            Priority: Minor
>             Fix For: 3.0.0-alpha4
>
>         Attachments: HADOOP-8143.1.patch, HADOOP-8143.2.patch, 
> HADOOP-8143.3.patch
>
>
> We should have the preserve blocksize (-pb) on in distcp by default.        
> checksum which is on by default will always fail if blocksize is not the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to