Xiaoyu Yao created HDFS-9723:
--------------------------------

             Summary: Improve Namenode Throttling Against Bad Jobs with FCQ and 
CallerContext
                 Key: HDFS-9723
                 URL: https://issues.apache.org/jira/browse/HDFS-9723
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Xiaoyu Yao
            Assignee: Xiaoyu Yao


HDFS namenode handles RPC requests from DFS clients and internal processing 
from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 
is the one of the existing efforts added since Hadoop 2.4 to address this 
issue. 

In current FCQ implementation, incoming RPC calls are scheduled based on the 
number of recent RPC calls (1000) of different users with a time-decayed 
scheduler. This works well when there is a clear mapping between users and 
their RPC calls from different jobs. However, this may not work effectively 
when it is hard to track calls to a specific caller in a chain of operations 
from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for 
operators/administrators to throttle all the hive jobs because of one “bad” 
query.

This JIRA proposed to leverage RPC caller context information (such as 
callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative 
to existing UGI (or user name when delegation token is not available) based 
Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue 
(HADOOP-9640) for better namenode throttling in multi-tenancy cluster 
deployment.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to