[ 
https://issues.apache.org/jira/browse/HDFS-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838899#comment-16838899
 ] 

Chen Liang commented on HDFS-12979:
-----------------------------------

Thanks for the review [~xkrogen], great comments. Post v006 patch.
bq. For the logic in the future creation loop...
bq. In that same loop, why do we...
Thanks for the catch. My earlier version delete keys from the map if it's not 
primary, so anything still in the maps means current sbn is the primary. But 
then I found it difficult to find the right place to add keys back to the map 
so I removed this logic, this is why these two places were like such. But I 
should have added new logic for checking primary as well. I changed the value 
to a helper class, storing last upload time and primary boolean flag.

bq. I'm not sure if we should swallow an InterruptedException...
bq. For your TODO at L277, can we use MultipleIOException ?
Added {{MultipleIOException}} to track multiple IOExceptions and break loop on 
InterruptedException. One thing is that when they both happened, which one 
should be thrown ({{MultipleIOException}}  can not take 
{{InterruptedException}}. v006 patch returns {{InterruptedException}}. This 
should not matter much though, because regardless of either exception gets 
thrown, the caller is just logging the exception, does not rely on it for 
anything.

bq. in StandbyCheckpointer we have checkpointConf, is there a way to use the 
same logic for the conf accesses within ImageServlet ?}}
I did not see anything like this here. Also the conf object gets constructed 
every doPut() call from http context in this method, so doesn't seem to worth 
having a long-lived wrapper class object around.

bq. For the time period check within ImageServlet
Seems to me this class has everything in seconds. e.g. 
{{dfs.namenode.checkpoint.period}}'s default value is in seconds. I think 
keeping seconds as much as possible is more consistent/less confusing here.

The other comments are all addressed.


> StandbyNode should upload FsImage to ObserverNode after checkpointing.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-12979
>                 URL: https://issues.apache.org/jira/browse/HDFS-12979
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Konstantin Shvachko
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: HDFS-12979.001.patch, HDFS-12979.002.patch, 
> HDFS-12979.003.patch, HDFS-12979.004.patch, HDFS-12979.005.patch, 
> HDFS-12979.006.patch
>
>
> ObserverNode does not create checkpoints. So it's fsimage file can get very 
> old making bootstrap of ObserverNode too long. A StandbyNode should copy 
> latest fsimage to ObserverNode(s) along with ANN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to