[jira] [Created] (HDFS-9184) Logging HDFS operation's caller context into audit logs

Mingliang Liu (JIRA) Wed, 30 Sep 2015 18:15:40 -0700

Mingliang Liu created HDFS-9184:
-----------------------------------

             Summary: Logging HDFS operation's caller context into audit logs
                 Key: HDFS-9184
                 URL: https://issues.apache.org/jira/browse/HDFS-9184
             Project: Hadoop HDFS
          Issue Type: Task
            Reporter: Mingliang Liu
            Assignee: Mingliang Liu



For a given HDFS operation (e.g. delete file), it's very helpful to track which 
upper level job issues it. The upper level callers may be specific Oozie tasks, 
MR jobs, and hive queries. One scenario is that the namenode (NN) is 
abused/spammed, the operator may want to know immediately which MR job should 
be blamed so that she can kill it. To this end, the caller context contains at 
least the application-dependent "tracking id".

There are several existing techniques that may be related to this problem.
1. Currently the HDFS audit log tracks the users of the the operation which is 
obviously not enough. It's common that the same user issues multiple jobs at 
the same time. Even for a single top level task, tracking back to a specific 
caller in a chain of operations of the whole workflow (e.g.Oozie -> Hive -> 
Yarn) is hard, if not impossible.
2. HDFS integrated {{htrace}} support for providing tracing information across 
multiple layers. The span is created in many places interconnected like a tree 
structure which relies on offline analysis across RPC boundary. For this use 
case, {{htrace}} has to be enabled at 100% sampling rate which introduces 
significant overhead. Moreover, passing additional information (via 
annotations) other than span id from root of the tree to leaf is a significant 
additional work.
3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there are 
some related discussion on this topic. The final patch implemented the tracking 
id as a part of delegation token. This protects the tracking information from 
being changed or impersonated. However, kerberos authenticated connections or 
insecure connections don't have tokens. [HADOOP-8779] proposes to use tokens in 
all the scenarios, but that might mean changes to several upstream projects and 
is a major change in their security implementation.

We propose another approach to address this problem. We also treat HDFS audit 
log as a good place for after-the-fact root cause analysis. We propose to put 
the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
the threadlocal object is passed to NN as a part of RPC header (optional), 
while on sever side NN retrieves it from header and put it to {{Handler}}'s 
threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
caller context for each operation. In this way, the existing code is not 
affected.

It is still challenging to keep "lying" client from abusing the caller context. 
Our proposal is to add a {{signature}} field to the caller context. The client 
choose to provide its signature along with the caller id. The operator may need 
to validate the signature at the time of offline analysis. The NN is not 
responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9184) Logging HDFS operation's caller context into audit logs

Reply via email to