[ 
https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489318#comment-13489318
 ] 

Konstantin Shvachko commented on HDFS-4114:
-------------------------------------------

> I'll re-purpose this jira to just remove the CheckpointNode.

I wonder what that means. CheckpointNode is just a role of the BackupNode in 
which it performs checkpoints like SNN and does not keep the in-memory state in 
sync with the primary NN.
So changing the subject doesn't change the purpose.

Eli:
>From formalistic perspective you cannot just remove something from core 
>Hadoop. You first need to deprecate it and then may remove in the next major 
>version. That is the rule I was following for the last 7 years. Let me know if 
>it has changed recently. And that is why particularly SNN was not removed but 
>deprecated, otherwise we would have had a more efficient checkpointing engine, 
>see below.

Todd: 
I see BackupNode as a better way of creating checkpoints. SNN uploads the image 
and the edits from NN, merges them in memory and then sends back the new 
checkpoint.
BN needs only to saveNamespace() from memory and then sends back the new image. 
This reduces the network traffic and local disk IOs on the upload of two large 
files. I have seen on multiple large clusters NameNode running much slower, 
when the checkpoint is in progress.
It is beneficial for HDFS performance to switch from SNN to BN for 
checkpointing. Therefore I would advocate re-re-deprecating SNN instead of 
removing BN.
I accept your criticism that BackupNode code path was getting less attention 
from me personally and the community at large. Will have to work on that on my 
side.
I would be glad to go into design discussion and potential enhancements of 
BackupNode with you. Would appreciate it given your experience with HA, as I 
believe the HA story for Hadoop isn't over with the implementation of Quorum 
Journal.
Although this issue is not about it. Sticking to the point, what are your 
arguments for removing (or better say deprecating) BN besides that it has bugs? 
Software tends to have bugs. E.g. you do not propose to remove BlockScanner 
just because it couldn't been fixed over a series jiras.
                
> Remove the CheckpointNode
> -------------------------
>
>                 Key: HDFS-4114
>                 URL: https://issues.apache.org/jira/browse/HDFS-4114
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the 
> BackupNode and CheckpointNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to