[
https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372404#comment-14372404
]
Colin Patrick McCabe commented on HDFS-7960:
--------------------------------------------
bq. there's a TODO: FIXME, we aren't passing in the BlockReportContext.
Yeah, mea culpa.
bq. processReport doesn't need that last parameter anymore either I think,
since the information is in the BR context.
The last parameter is needed because we want to eliminate zombie storages only
after all storages have been processed, and a single call to
{{NameNodeRpcServer#blockReport}} can handle multiple storages
bq. Is there a need for BR ids to be monotonic increasing? Else using a random
number seems better. I see you do a fixup by checking with the previous ID, but
with random this shouldn't be necessary
I like the idea of monotonic increasing BR ids for two reasons: it makes it
easier to see in the logs what block report came after what block report, and
it effectively removes the (admittedly very, very small) chance of a collision
between two subsequent BR IDs. The monotonic timer in Linux (or other OS) only
gets reset when a node reboots, so even restarting the DN process will not
normally reset the ID.
bq. If you wanted to add comments about all this, BlockReportContext's class
javadoc would be a good choice.
Good idea, I added some comments there.
bq. space after assert
fixed
> The full block report should prune zombie storages even if they're not empty
> ----------------------------------------------------------------------------
>
> Key: HDFS-7960
> URL: https://issues.apache.org/jira/browse/HDFS-7960
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Colin Patrick McCabe
> Priority: Critical
> Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch,
> HDFS-7960.004.patch
>
>
> The full block report should prune zombie storages even if they're not empty.
> We have seen cases in production where zombie storages have not been pruned
> subsequent to HDFS-7575. This could arise any time the NameNode thinks there
> is a block in some old storage which is actually not there. In this case,
> the block will not show up in the "new" storage (once old is renamed to new)
> and the old storage will linger forever as a zombie, even with the HDFS-7596
> fix applied. This also happens with datanode hotplug, when a drive is
> removed. In this case, an entire storage (volume) goes away but the blocks
> do not show up in another storage on the same datanode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)