[
https://issues.apache.org/jira/browse/HADOOP-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719024#comment-17719024
]
ASF GitHub Bot commented on HADOOP-18723:
-----------------------------------------
steveloughran commented on PR #5603:
URL: https://github.com/apache/hadoop/pull/5603#issuecomment-1533526312
Well, I'm afraid your specific problem does not match Dee why do use cases
of uploading to stores without checksums. Now, I would I've been happier if
distcp's -skipCrc option was required to copy data from an FS with checksums to
one without, but it is not and to add it now would break so many people's
workflows.
So what do we do here?
maybe: create counters of why files were copied, specifically
* not found at destination
* file length different
* modtime
* checksum
Then after a job you can see why files were copied from the host where the
job was launched. Then if you want to know why there were issues such as
checksums and modtimes, you can log out to debug. Obviously, this will be
something to add to the distcp documentation.
Now: big warning. I am personally scared of distCp. It is a critical
workflow tool and even use programmatically, yet it is surprisingly brittle. It
is a running joke that's the last person two add any code to the module gets to
field or support calls until someone else comes along. Thank you for
volunteering! This also explains why we will be very reluctant/strict about
taking on changes. Don't take it personally is as hey everyone gets that same
grilling here.
> Add detail logs if distcp checksum mismatch
> -------------------------------------------
>
> Key: HADOOP-18723
> URL: https://issues.apache.org/jira/browse/HADOOP-18723
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Janus Chow
> Assignee: Janus Chow
> Priority: Major
> Labels: pull-request-available
>
> We encountered some errors of mismatch checksum during Distcp jobs. It took
> us some time to figure out that checksum type is different.
> Adding error logs shall help us to figure out such problems faster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]