[
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266715#comment-15266715
]
Daryn Sharp commented on HDFS-10301:
------------------------------------
Still catching up and need to review patch. First question, how is this
interleaving happening on a frequent basis?
An interesting observation (if I interpreted the logs correctly) is processing
all 4 storages with ~14k blocks/storage appears to takes minutes to process?
Tens of seconds appear to elapse between processing each storage. There's some
serious contention that seems indicative of a nasty bug or suboptimal
configuration exacerbating this bug.
Is the DN rpc timeout set to something very low? Has the number of RPC
handlers been greatly increased? Are there frequent deletes of massive trees?
Is there a lot of decomm'ing with a low check interval?
> BlockReport retransmissions may lead to storages falsely being declared
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.1
> Reporter: Konstantin Shvachko
> Assignee: Colin Patrick McCabe
> Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch,
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it
> sends the block report again. Then NameNode while process these two reports
> at the same time can interleave processing storages from different reports.
> This screws up the blockReportId field, which makes NameNode think that some
> storages are zombie. Replicas from zombie storages are immediately removed,
> causing missing blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]