[
https://issues.apache.org/jira/browse/HDFS-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717797#comment-13717797
]
Chris Nauroth commented on HDFS-5020:
-------------------------------------
Jing, the patch is looking really good. Here is feedback on a few minor things:
{code}
// for a retry request (of DataProtocol#blockReceivedAndDeleted with
// RECEIVED_BLOCK), we currently also decrease the approximate number.
{code}
Minor nitpick: I think this was supposed say
"DatanodeProtocol#blockReceivedAndDeleted".
{code}
/**
* An object that contains information about a block that
* is being replicated. It records the timestamp when the
* system started replicating the most recent copy of this
* block. It also records the number of replication
* requests that are in progress.
*/
static class PendingBlockInfo {
{code}
Let's update the comment above to state that it tracks the individual data
nodes to enforce idempotence (not just "number of replication requests").
{code}
PendingBlockInfo(DatanodeDescriptor[] targets) {
this.timeStamp = now();
this.targets = targets == null ? new ArrayList<DatanodeDescriptor>(0)
: new ArrayList<DatanodeDescriptor>(Arrays.asList(targets));
}
{code}
Perhaps consider switching to the default constructor of {{ArrayList}}?
Initializing with capacity 0 probably guarantees a reallocation will happen on
a subsequent call to {{incrementReplicas}}. The default constructor uses
capacity 10, which is higher than typical settings for replication, so it would
probably prevent reallocations.
{code}
MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(
DATANODE_COUNT).build();
cluster.waitActive();
DistributedFileSystem hdfs = cluster.getFileSystem();
FSNamesystem fsn = cluster.getNamesystem();
BlockManager blkManager = fsn.getBlockManager();
try {
{code}
{{MiniDFSCluster#waitAcitve}} and {{MiniDFSCluster#getFileSystem}} can throw
{{IOException}}. Right now, these calls are outside the try block, so if they
do throw, then we wouldn't run the finally block to do a proper cleanup. I
recommend moving these 2 calls into the try block.
> Make DatanodeProtocol#blockReceivedAndDeleted idempotent
> --------------------------------------------------------
>
> Key: HDFS-5020
> URL: https://issues.apache.org/jira/browse/HDFS-5020
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.0.0
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Attachments: HDFS-5020.001.patch, HDFS-5020.002.patch
>
>
> Currently DatanodeProtocol#blockReceivedAndDeleted is not idempotent because
> pending replication maps in NN only tracks the number of the pending
> replicas, and a retry request about a received block can cause the pending
> number decreased more than once. We can make blockReceivedAndDeleted
> idempotent by tracking corresponding datanodes in the pending replication
> map.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira