[jira] [Commented] (HDFS-10625) VolumeScanner to report why a block is found bad

Vinayakumar B (JIRA) Wed, 27 Jul 2016 22:47:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396985#comment-15396985
 ]


Vinayakumar B commented on HDFS-10625:
--------------------------------------

bq. We can add a catch block here to catch the IOException thrown, then include 
the replica information and throw a new IO exception, e.g:
One problem here, is for the places which expects Specific exception such as 
{{ChecksumException}} or {{FileNotFoundException}}, they get IOException with 
cause set as ChecksumException or FNFE.
So its better to not to change in this. Let original IOException thrown back. 
Anyway DN logs will be there to catch the replica details.

bq. Looks like we can make this replica a member of BlockSender instead of a 
local variable here, so that we can refer to it when needed, such as for this 
jira. We probably should make replicaVisibleLength a member and report it as 
part of the replica info too, since when the writing is going on, this value 
may be changing concurrently.
Making ReplicaInfo a member is good, but making {{replicaVisibleLength}} a 
member may not be required. Because already {{endOffSet}} will be present which 
can decide how much BlockSender intended to read. So whenever required 
{{endOffset}} can be used.
Coming to checksum verfication, BlockSender will do checkSum verification for 
only finalized blocks via VolumeScanner. Not while reading(Reading case 
verification happens at the client). So we can expect replica can be finalized 
in this case and no change in the visibleLength.

So I feel, for the latest patch change required is, combining HDFS-10626, 
making replicaInfo a member and using to construct checksumException message.

>  VolumeScanner to report why a block is found bad
> -------------------------------------------------
>
>                 Key: HDFS-10625
>                 URL: https://issues.apache.org/jira/browse/HDFS-10625
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, hdfs
>            Reporter: Yongjun Zhang
>            Assignee: Rushabh S Shah
>              Labels: supportability
>         Attachments: HDFS-10625-1.patch, HDFS-10625.patch
>
>
> VolumeScanner may report:
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> blk_1170125248_96458336 on /d/dfs/dn
> {code}
> It would be helpful to report the reason why the block is bad, especially 
> when the block is corrupt, where is the first corrupted chunk in the block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10625) VolumeScanner to report why a block is found bad

Reply via email to