Hi all,

In our production environment, we occasionally encounter a problem where a
user submits an abnormal computation task, causing a sudden flood of
requests, which causes the queueTime and processingTime of the Namenode to
rise very high, causing a large backlog of tasks.

We usually locate and kill specific Spark, Flink, or MapReduce tasks based
on metrics and audit logs. Currently, IP and UGI are recorded in audit
logs, but there is no port information, so it is difficult to locate
specific processes sometimes. Therefore, I propose that we add the port
information to the audit log, so that we can easily track the upstream
process.

Currently, some projects contain port information in audit logs, such as
Hbase and Alluxio. I think it is also necessary to add port information for
HDFS audit logs.

I submitted a PR(https://github.com/apache/hadoop/pull/3538), which has
been tested in our test environment, and both RPC and HTTP are in effect. I
look forward to your discussion on possible problems and suggestions for
modification. I will actively update the PR.

Best Regards,
Tom

Reply via email to