[
https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396168#comment-15396168
]
Yongjun Zhang commented on HDFS-10625:
--------------------------------------
Hi [~linyiqun] and [~shahrs87],
Sorry for the delay. I took a further look, and think it's good to include the
HDFS-10626 fix here, and mark HDFS-10626 as a duplicate. I'd like to include
both of you as contributers for this jira.
I looked at the latest patch here, it looks to me that the best place to fix is
in BlockSender
{code}
long sendBlock(DataOutputStream out, OutputStream baseStream,
DataTransferThrottler throttler) throws IOException {
final TraceScope scope = datanode.getTracer().
newScope("sendBlock_" + block.getBlockId());
try {
return doSendBlock(out, baseStream, throttler);
} finally {
scope.close();
}
}
{code}
We can add a catch block here to catch the IOException thrown, then include the
replica information and throw a new IO exception, e.g:
{code}
try {
return doSendBlock(out, baseStream, throttler);
} catch (IOException ie) {
// throw new IOE here with replica info
throw new IOException(replicaInfoStr, ie);
} finally {
scope.close();
}
{code}
There is a snippet in the constructor to get the replica info:
{code}
final Replica replica;
final long replicaVisibleLength;
synchronized(datanode.data) {
replica = getReplica(block, datanode);
replicaVisibleLength = replica.getVisibleLength();
}
{code}
Looks like we can make this replica a member of BlockSender instead of a local
variable here, so that we can refer to it when needed, such as for this jira.
We probably should make {{replicaVisibleLength}} a member and report it as part
of the replica info too, since when the writing is going on, this value may be
changing concurrently. Hi [~vinayrpet], what do you think about this
suggestion?
Thanks.
> VolumeScanner to report why a block is found bad
> -------------------------------------------------
>
> Key: HDFS-10625
> URL: https://issues.apache.org/jira/browse/HDFS-10625
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, hdfs
> Reporter: Yongjun Zhang
> Assignee: Rushabh S Shah
> Labels: supportability
> Attachments: HDFS-10625-1.patch, HDFS-10625.patch
>
>
> VolumeScanner may report:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad
> blk_1170125248_96458336 on /d/dfs/dn
> {code}
> It would be helpful to report the reason why the block is bad, especially
> when the block is corrupt, where is the first corrupted chunk in the block.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]