[
https://issues.apache.org/jira/browse/HDFS-16452?focusedWorklogId=729732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-729732
]
ASF GitHub Bot logged work on HDFS-16452:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Feb/22 16:18
Start Date: 18/Feb/22 16:18
Worklog Time Spent: 10m
Work Description: xkrogen commented on pull request #3976:
URL: https://github.com/apache/hadoop/pull/3976#issuecomment-1044778317
Quick disclaimer, it has been a while since I have looked at any
proxy-provider code.
This PR looks in the wrong direction to me. As you mentioned,
`failoverProxy` is used to service _write requests_, which must be serviced by
the active NameNode (in opposition to _read requests_, which can be serviced by
Observer NNs by looking inside `nameNodeProxies`). So, if you want to contact
the active NN, the right way to do so is to use `failoverProxy`. Note that
`msync()` is not a special case here -- other write RPCs also require the
active NN.
You shared logs that some standby NNs are contacted before the active is
found. I guess you are using `ConfiguredFailoverProxyProvider`? In this
implementation, you list multiple NN addressed, and upon startup, the client
has no idea which one is active. It has to go through and contact each one
until it finds one which is active. So it is expected that under normal
operation you will see logs like the ones you shared, where it contacts standby
NNs while searching for the active. After it finds the active, then it should
remain sticky, and so (assuming there are no changes in active NN), you should
only see those logs when the client first submits an RPC.
Your new implementation is trying to scan through the NameNodes and check
their status to find the active, but this seems to be breaking the contract
with `failoverProxy`, which is expected to be delegated to for active/standby
determination.
If you want to change the active/standby determination, you should change
the behavior in your `AbstractNNFailoverProxyProvider` (e.g.
`ConfiguredFailoverProxyProvider`), not the behavior of
`ObserverReadProxyProvider`, which should only layer _on top of_ the
`AbstractNNFailoverProxyProvider` to provide the additional Observer NN
functionality.
cc @sunchao @shvachko in case you have any thoughts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 729732)
Time Spent: 1h 50m (was: 1h 40m)
> msync RPC should send to Acitve Namenode directly
> --------------------------------------------------
>
> Key: HDFS-16452
> URL: https://issues.apache.org/jira/browse/HDFS-16452
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namanode
> Affects Versions: 3.3.1
> Reporter: zhanghaobo
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> In current ObserverReadProxyProvider implementation, we use the following
> code to invoke msync RPC.
> {code:java}
> getProxyAsClientProtocol(failoverProxy.getProxy().proxy).msync(); {code}
> But msync RPC maybe send to Observer NameNode in this way, and then failover
> to Active NameNode. This can be avoid by applying this patch.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]