[ https://issues.apache.org/jira/browse/FLINK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452777#comment-17452777 ]
刘方奇 commented on FLINK-25029: ----------------------------- [~arvid], Hi, I get your advice in the PR, that 's valuable for me. I try to reply all your comment and switch the code to the better, could you help to review again when you are free? BTW, I still has some questions in the PR. ( e.g : what file should I put the option into, that may help me slove the CI problem.) > Hadoop Caller Context Setting In Flink > -------------------------------------- > > Key: FLINK-25029 > URL: https://issues.apache.org/jira/browse/FLINK-25029 > Project: Flink > Issue Type: Improvement > Components: FileSystems > Reporter: 刘方奇 > Assignee: 刘方奇 > Priority: Major > Labels: pull-request-available > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > The above is the main effect of the Caller Context. HDFS Client set Caller > Context, then name node get it in audit log to do some work. > Now the Spark and hive have the Caller Context to meet the HDFS Job Audit > requirement. > In my company, flink jobs often cause some problems for HDFS, so we did it > for preventing some cases. > If the feature is general enough. Should we support it, then I can submit a > PR for this. -- This message was sent by Atlassian Jira (v8.20.1#820001)