[ 
https://issues.apache.org/jira/browse/HDFS-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698571#comment-17698571
 ] 

ASF GitHub Bot commented on HDFS-16947:
---------------------------------------

virajjasani opened a new pull request, #5470:
URL: https://github.com/apache/hadoop/pull/5470

   Namenode heartbeat service should provide error with full stacktrace if it 
cannot register namenode in the state store. As of today, we only log info msg.
   
   For zookeeper based impl, this might mean either a) curator manager is not 
initialized or b) if it fails to write to znode after exhausting retries. For 
either of these cases, reporting only INFO log might not be good enough and we 
might have to look for errors elsewhere.
   
   Sample example:
   ```
   2023-02-20 23:10:33,714 DEBUG [NamenodeHeartbeatService {ns} nn0-0] 
router.NamenodeHeartbeatService - Received service state: ACTIVE from HA 
namenode: {ns}-nn0:nn-0-{ns}.{cluster}:9000
   2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] 
impl.MembershipStoreImpl - Inserting new NN registration: 
nn-0.namenode.{cluster}:8888->{ns}:nn0:nn-0-{ns}.{cluster}:9000-ACTIVE
   2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] 
router.NamenodeHeartbeatService - Cannot register namenode in the State Store
   ```
   
   If we could log full stacktrace:
   ```
   2023-02-21 00:20:24,691 ERROR [NamenodeHeartbeatService {ns} nn0-0] 
router.NamenodeHeartbeatService - Cannot register namenode in the State Store
   
org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: 
State Store driver StateStoreZooKeeperImpl in nn-0.namenode.{cluster} is not 
ready.
           at 
org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.verifyDriverReady(StateStoreDriver.java:158)
           at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.putAll(StateStoreZooKeeperImpl.java:235)
           at 
org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreBaseImpl.put(StateStoreBaseImpl.java:74)
           at 
org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl.namenodeHeartbeat(MembershipStoreImpl.java:179)
           at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:381)
           at 
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:317)
           at 
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.lambda$periodicInvoke$0(NamenodeHeartbeatService.java:244)
   ...
   ... 
   ```




> RBF NamenodeHeartbeatService to report error for not being able to register 
> namenode in state store
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16947
>                 URL: https://issues.apache.org/jira/browse/HDFS-16947
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>
> Namenode heartbeat service should provide error with full stacktrace if it 
> cannot register namenode in the state store. As of today, we only log info 
> msg.
> For zookeeper based impl, this might mean either a) curator manager is not 
> initialized or b) if it fails to write to znode after exhausting retries. For 
> either of these cases, reporting only INFO log might not be good enough and 
> we might have to look for errors elsewhere.
>  
> Sample example:
> {code:java}
> 2023-02-20 23:10:33,714 DEBUG [NamenodeHeartbeatService {ns} nn0-0] 
> router.NamenodeHeartbeatService - Received service state: ACTIVE from HA 
> namenode: {ns}-nn0:nn-0-{ns}.{cluster}:9000
> 2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] 
> impl.MembershipStoreImpl - Inserting new NN registration: 
> nn-0.namenode.{cluster}:8888->{ns}:nn0:nn-0-{ns}.{cluster}:9000-ACTIVE
> 2023-02-20 23:10:33,731 INFO  [NamenodeHeartbeatService {ns} nn0-0] 
> router.NamenodeHeartbeatService - Cannot register namenode in the State Store
>  {code}
> If we could log full stacktrace:
> {code:java}
> 2023-02-21 00:20:24,691 ERROR [NamenodeHeartbeatService {ns} nn0-0] 
> router.NamenodeHeartbeatService - Cannot register namenode in the State Store
> org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException:
>  State Store driver StateStoreZooKeeperImpl in nn-0.namenode.{cluster} is not 
> ready.
>         at 
> org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.verifyDriverReady(StateStoreDriver.java:158)
>         at 
> org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl.putAll(StateStoreZooKeeperImpl.java:235)
>         at 
> org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreBaseImpl.put(StateStoreBaseImpl.java:74)
>         at 
> org.apache.hadoop.hdfs.server.federation.store.impl.MembershipStoreImpl.namenodeHeartbeat(MembershipStoreImpl.java:179)
>         at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:381)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:317)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.lambda$periodicInvoke$0(NamenodeHeartbeatService.java:244)
> ...
> ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to