[
https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747618#comment-16747618
]
Kai Xie commented on HADOOP-16049:
----------------------------------
submitted patch branch-2-004 that contains both the pread fix and the distcp
unit test hanging fix. Let's see if jenkins is happy.
the idea for the unit test hanging fix:
* I noticed that only TestDistCpSync, TestDistCpSyncReverseFromSource, and
TestDistCpSyncReverseFromTarget are hanging for distcp at branch-2.
* From their logs, it can be observed that MiniDFSCluster is up & down per
test case and consistently crashes the JVM (OOM? didn't have chance to look at
the dump) at the end of the 9th test case. I guess there could be memory leak
in branch-2's MiniDFSCluster, crashes the JVM and causes maven to hang.
* so instead of up & down MiniDFSCluster per test case, in the patch I made it
to per test class
> DistCp result has data and checksum mismatch when blocks per chunk > 0
> ----------------------------------------------------------------------
>
> Key: HADOOP-16049
> URL: https://issues.apache.org/jira/browse/HADOOP-16049
> Project: Hadoop Common
> Issue Type: Bug
> Components: tools/distcp
> Affects Versions: 2.9.2
> Reporter: Kai Xie
> Assignee: Kai Xie
> Priority: Major
> Attachments: HADOOP-16049-branch-2-003.patch,
> HADOOP-16049-branch-2-004.patch
>
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
> ...
> if (action == FileAction.APPEND) {
> sourceOffset += bytesRead;
> }
> ... // write to dst
> bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never
> updated when blocks per chunk is set to > 0 (which always disables append
> action). So for chunk with offset != 0, it will keep copying the first few
> bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default
> copy buffer size) in class TestDistCpSystem and run it.
> HADOOP-15292 has resolved the issue reported in this ticket in
> trunk/branch-3.1/branch-3.2 by not using the positioned read, but has not
> been backported to branch-2 yet
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]