[
https://issues.apache.org/jira/browse/HDFS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191496#comment-13191496
]
Aaron T. Myers commented on HDFS-2691:
--------------------------------------
bq. My only worry with the above designs is that this might trigger the case of
HDFS-2791...
Talked this over with Todd this morning, and we both agree that the fix he's
working on for HDFS-2791 should address this concern. Given that, "Solution 1"
above seems to be clearly the best way forward on this issue.
I've reviewed the latest patch and it largely looks good. A few comments,
mostly nits:
# Perhaps we should rename PIPELINE_STARTED to RECEIVING_BLOCK? Seems more in
line with the other members of the BlockStatus enum, RECEIVED_BLOCK and
DELETED_BLOCK. It's also more in line with the new BPOfferService method,
notifyNamenodeReceivingBlock.
# Given that we're now also handling the PIPELINE_STARTED case, perhaps we
should rename the BlockManager#blockReceivedAndDeleted method to reflect this
additional function?
# In BlockManager#blockReceivedAndDeleted do you really think it's reasonable
to only warn here? I'd be in favor of at least bumping to ERROR, maybe even
throwing an exception.
{code}
+ NameNode.stateChangeLog.warn(
+ "Unknown block status code reported by " + nodeID.getName() +
+ ": " + rdbi);
{code}
# Typo: preceeds -> precedes
# In DataNode#notifyNamenodeReceivingBlock, do you really think failing to find
a BPOS for the given BP ID is only worthy of a WARN and not an ERROR? I realize
it's consistent with DataNode#notifyNamenodeReceivedBlock and
DataNode#notifyNamenodeDeletedBlock, but it seems like they should all be ERROR.
# Does it make sense to rename the ReceivedDeletedBlockInfo class to something
more general, now that it's also being used for the PIPELINE_STARTED case?
# The comment at the top of ReceivedDeletedBlockInfo should probably also
mention the fact that it now stores BlockStatus as well.
# Any reason BlockStatus#code isn't private?
# Similar to above, seems like we should change the protobuf enum
BlockStatus#PIPELINE_STARTED -> BlockStatus#RECEIVING
# There's a few TODOs in the tests which reference HDFS-2693, which I think can
be removed.
> HA: Tests and fixes for pipeline targets and replica recovery
> -------------------------------------------------------------
>
> Key: HDFS-2691
> URL: https://issues.apache.org/jira/browse/HDFS-2691
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hdfs-2691.txt, hdfs-2691.txt
>
>
> Currently there are some TODOs around pipeline/recovery code in the HA
> branch. For example, commitBlockSynchronization only gets sent to the active
> NN which may have failed over by that point. So, we need to write some tests
> here and figure out what the correct behavior is.
> Another related area is the treatment of targets in the pipeline. When a
> pipeline is created, the active NN adds the "expected locations" to the
> BlockInfoUnderConstruction, but the DN identifiers aren't logged with the
> OP_ADD. So after a failover, the BlockInfoUnderConstruction will have no
> targets and I imagine replica recovery would probably trigger some issues.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira