Hi All,
Wanted to share and discuss a problem that we are facing in the present
situation when using Router Based Federation. Presently when a client
connects through Router to Namenode, the Namenode receives the caller
context of the router rather than being of the actual client. This
typically can cause a couple of problems, Two of which we have identified
as of now :

Firstly, The concept of data locality doesn't work correctly when
connecting through Router as the Namenode considers Router as the actual
client and performs all the optimizations/computations based on Router's
location rather than using the actual client location.

Secondly, The Namenode Retry Cache can not be used as if in case of
failover or such an event, the client retries again and connects to other
router, in that case the since the Call Id is from the Router, but not from
the actual client, the Retry Cache doesn't identify it as a repeated call
and serves it as a whole new call which creates inconsistencies.

We have been discussing and trying on solutions since a long time now and
tried out a couple of solutions :

   - Add proxy address in IPC connection (HADOOP-16254
   <https://issues.apache.org/jira/browse/HADOOP-16254>) --> This had some
   security concerns for Daryn.
   - The RouterRPCServer should transfer CallerContext and client ip to
   NamenodeRpcServer (HDFS-13293
   <https://issues.apache.org/jira/browse/HDFS-13293>) --> This tend to be
   little opaque and couple of more problems stated as in HDFS-13248
   <https://issues.apache.org/jira/browse/HDFS-13248> by Ajay Kumar and
   Arpit Agarwal
   - Favored Nodes -->  Pass the local node as favored node. But this isn't
   a complete solution. This doesn't take into account the fallback in case of
   non availability of local nodes and couple of more. this isn't a solution
   for the Retry Cache problem too.


The related JIRA's where most of the discussion happened, if someone tends
to follow :
HDFS-13248 <https://issues.apache.org/jira/browse/HDFS-13248> :- For the
DataLocality Problem. Has a patch too in the end with Solution 3(Favored
Nodes)
HDFS-15079 <https://issues.apache.org/jira/browse/HDFS-15079> , HDFS-15078
<https://issues.apache.org/jira/browse/HDFS-15078> & HDFS-15310
<https://issues.apache.org/jira/browse/HDFS-15310>  : For the Retry Cache
Problem.
HADOOP-16254 <https://issues.apache.org/jira/browse/HADOOP-16254> :
Solution 1 : Add proxy address in IPC connection.
HDFS-13293 <https://issues.apache.org/jira/browse/HDFS-13293> : Solution 2
: Passing Caller Context.

Do let us know if any help here, Any further solutions, workarounds or a
way out to unblock or improvise the tried solutions.

Thanx!!!
-Ayush

Reply via email to