[ 
https://issues.apache.org/jira/browse/HDFS-16452?focusedWorklogId=729733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729733
 ]

ASF GitHub Bot logged work on HDFS-16452:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Feb/22 16:18
            Start Date: 18/Feb/22 16:18
    Worklog Time Spent: 10m 
      Work Description: xkrogen edited a comment on pull request #3976:
URL: https://github.com/apache/hadoop/pull/3976#issuecomment-1044778317


   Quick disclaimer, it has been a while since I have looked at any 
proxy-provider code.
   
   This PR looks in the wrong direction to me. As you mentioned, 
`failoverProxy` is used to service _write requests_, which must be serviced by 
the active NameNode (in opposition to _read requests_, which can be serviced by 
Observer NNs by looking inside `nameNodeProxies`). So, if you want to contact 
the active NN, the right way to do so is to use `failoverProxy`. Note that 
`msync()` is not a special case here -- other write RPCs also require the 
active NN.
   
   You shared logs that some standby NNs are contacted before the active is 
found. I guess you are using `ConfiguredFailoverProxyProvider`? In this 
implementation, you list multiple NN addresses, and upon startup, the client 
has no idea which one is active. It has to go through and contact each one 
until it finds one which is active. So it is expected that under normal 
operation you will see logs like the ones you shared, where it contacts standby 
NNs while searching for the active. After it finds the active, then it should 
remain sticky, and so (assuming there are no changes in active NN), you should 
only see those logs when the client first submits an RPC.
   
   Your new implementation is trying to scan through the NameNodes and check 
their status to find the active, but this seems to be breaking the contract 
with `failoverProxy`, which is expected to be delegated to for active/standby 
determination.
   
   If you want to change the active/standby determination, you should change 
the behavior in your `AbstractNNFailoverProxyProvider` (e.g. 
`ConfiguredFailoverProxyProvider`), not the behavior of 
`ObserverReadProxyProvider`, which should only layer _on top of_ the 
`AbstractNNFailoverProxyProvider` to provide the additional Observer NN 
functionality.
   
   cc @sunchao @shvachko in case you have any thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 729733)
    Time Spent: 2h  (was: 1h 50m)

> msync RPC  should send to Acitve Namenode directly
> --------------------------------------------------
>
>                 Key: HDFS-16452
>                 URL: https://issues.apache.org/jira/browse/HDFS-16452
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>    Affects Versions: 3.3.1
>            Reporter: zhanghaobo
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> In current ObserverReadProxyProvider implementation,   we use the following 
> code to  invoke msync RPC.
> {code:java}
> getProxyAsClientProtocol(failoverProxy.getProxy().proxy).msync(); {code}
> But msync RPC maybe send to Observer NameNode in this way, and then failover 
> to Active NameNode.   This can be avoid by applying this patch. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to