Hi All, Wanted to share and discuss a problem that we are facing in the present situation when using Router Based Federation. Presently when a client connects through Router to Namenode, the Namenode receives the caller context of the router rather than being of the actual client. This typically can cause a couple of problems, Two of which we have identified as of now :
Firstly, The concept of data locality doesn't work correctly when connecting through Router as the Namenode considers Router as the actual client and performs all the optimizations/computations based on Router's location rather than using the actual client location. Secondly, The Namenode Retry Cache can not be used as if in case of failover or such an event, the client retries again and connects to other router, in that case the since the Call Id is from the Router, but not from the actual client, the Retry Cache doesn't identify it as a repeated call and serves it as a whole new call which creates inconsistencies. We have been discussing and trying on solutions since a long time now and tried out a couple of solutions : - Add proxy address in IPC connection (HADOOP-16254 <https://issues.apache.org/jira/browse/HADOOP-16254>) --> This had some security concerns for Daryn. - The RouterRPCServer should transfer CallerContext and client ip to NamenodeRpcServer (HDFS-13293 <https://issues.apache.org/jira/browse/HDFS-13293>) --> This tend to be little opaque and couple of more problems stated as in HDFS-13248 <https://issues.apache.org/jira/browse/HDFS-13248> by Ajay Kumar and Arpit Agarwal - Favored Nodes --> Pass the local node as favored node. But this isn't a complete solution. This doesn't take into account the fallback in case of non availability of local nodes and couple of more. this isn't a solution for the Retry Cache problem too. The related JIRA's where most of the discussion happened, if someone tends to follow : HDFS-13248 <https://issues.apache.org/jira/browse/HDFS-13248> :- For the DataLocality Problem. Has a patch too in the end with Solution 3(Favored Nodes) HDFS-15079 <https://issues.apache.org/jira/browse/HDFS-15079> , HDFS-15078 <https://issues.apache.org/jira/browse/HDFS-15078> & HDFS-15310 <https://issues.apache.org/jira/browse/HDFS-15310> : For the Retry Cache Problem. HADOOP-16254 <https://issues.apache.org/jira/browse/HADOOP-16254> : Solution 1 : Add proxy address in IPC connection. HDFS-13293 <https://issues.apache.org/jira/browse/HDFS-13293> : Solution 2 : Passing Caller Context. Do let us know if any help here, Any further solutions, workarounds or a way out to unblock or improvise the tried solutions. Thanx!!! -Ayush