[
https://issues.apache.org/jira/browse/HDFS-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Badger updated HDFS-11094:
-------------------------------
Attachment: HDFS-11094.002.patch
The tests that failed are related to this patch. They depend on the datanode
*not* being able to heartbeat correctly, so the test will timeout when this fix
is put in. Additionally, upon further review it looks like the cluster is
actually waiting for the heartbeat to be received. The race is between the test
checking for the active NN and the HAState getting set after a heartbeat
response is received by the data node. After talking with [~daryn], I think
that allowing the NN to send back its HA state during registration is the best
way to fix this. This way the datanodes will always know the HAState of the NN
that they are connected to.
I'm attaching a patch that adds the HAServiceState of the NN to the
NamespaceInfo that gets sent from the NN to the DN during a versionRequest.
This is one of the first things that happens during a DN-NN connection setup
and happens even before registration of the DN.
[~daryn], [~liuml07], can you please review? Thanks
> TestLargeBlockReport fails intermittently
> -----------------------------------------
>
> Key: HDFS-11094
> URL: https://issues.apache.org/jira/browse/HDFS-11094
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Eric Badger
> Assignee: Eric Badger
> Attachments: HDFS-11094.001.patch, HDFS-11094.002.patch
>
>
> {noformat}
> java.lang.NullPointerException: null
> at
> org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport.testBlockReportSucceedsWithLargerLengthLimit(TestLargeBlockReport.java:96)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]