[
https://issues.apache.org/jira/browse/HADOOP-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073938#comment-17073938
]
Mithun Radhakrishnan commented on HADOOP-8143:
----------------------------------------------
Thank you for pointing to HDFS-13056. This should address the crux of the
problem, i.e. decoupling checksums from block-size. I have only perused it
briefly, but the following section from the HDFS-13056's description is
promising:
{quote}This option can be enabled or disabled at the granularity of individual
client calls by setting the new configuration option
`dfs.checksum.combine.mode` to `COMPOSITE_CRC`
{quote}
It appears that this doesn't require opt-in on HDFS/Name-node, and that
querying for a file's checksum with
{{`dfs.checksum.combine.mode=COMPOSITE_CRC`}} should return a CRC independent
of block-size.
If this holds, perhaps DistCp should be changed to fetch CRCs thus, freeing us
of requiring to preserve block-size for the sake of correctness. (It'll only
hold on Hadoop 3.1.1+.)
{quote}At the very least, we need a way to turn this new default off.
Especially when -skipCrcCheck is true.
{quote}
I'm a little rusty, but it surprises me that block-size preservation isn't
turned off when {{`-skipCrcCheck && (!-pb)`}}. If this isn't so, then that's an
oversight and needs fixing. As a workaround, specifying `-pu`, for instance,
should disable block-size preservation.
> Change distcp to have -pb on by default
> ---------------------------------------
>
> Key: HADOOP-8143
> URL: https://issues.apache.org/jira/browse/HADOOP-8143
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Dave Thompson
> Assignee: Mithun Radhakrishnan
> Priority: Minor
> Fix For: 3.0.0-alpha4
>
> Attachments: HADOOP-8143.1.patch, HADOOP-8143.2.patch,
> HADOOP-8143.3.patch
>
>
> We should have the preserve blocksize (-pb) on in distcp by default.
> checksum which is on by default will always fail if blocksize is not the same.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]