Thanks @Takanobu for your reply. I will make it configurable and default as disable.
Takanobu Asanuma <tasan...@apache.org> 于2021年10月13日周三 下午3:34写道: > I think many users parse audit logs in their own way, and they will be > affected if the format is changed. So I agree with Masatake's suggestion. > - Takanobu > > 2021年10月11日(月) 18:19 tom lee <tomlees...@gmail.com>: > >> Thanks @Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> for your >> suggestion. This is a good idea. >> >> Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> 于2021年10月11日周一 下午3:26写道: >> >> > > I am not sure whether we can directly go and change this. Any changes >> to >> > Audit Log format are considered incompatible. >> > > >> > > >> > >> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output >> > >> > Adding a field for caller context seemed to be accepted since it is >> > optional feature disabled by default. >> > >> > >> https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L8480-L8498 >> > >> > If we need to add fields, making it optional might be an option. >> > >> > Masatake Iwasaki >> > >> > On 2021/10/11 16:09, tom lee wrote: >> > > However, adding port is to modify the internal content of the IP >> field, >> > > which has little impact on the overall layout. >> > > >> > > In our cluster, we parse the audit log through Vector and send the >> data >> > to >> > > Kafka, which is unaffected. >> > > >> > > tom lee <tomlees...@gmail.com> 于2021年10月11日周一 下午2:44写道: >> > > >> > >> Thank Ayush for reminding me. I also have similar concerns, so I >> > published >> > >> this discussion, hoping to let the members of the community know >> about >> > this >> > >> matter and then give suggestions. >> > >> >> > >> Ayush Saxena <ayush...@gmail.com> 于2021年10月11日周一 下午2:38写道: >> > >> >> > >>> Hey >> > >>> I am not sure whether we can directly go and change this. Any >> changes >> > to >> > >>> Audit Log format are considered incompatible. >> > >>> >> > >>> >> > >>> >> > >> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output >> > >>> >> > >>> -Ayush >> > >>> >> > >>> On 10-Oct-2021, at 7:57 PM, tom lee <tomlees...@gmail.com> wrote: >> > >>> >> > >>> Hi all, >> > >>> >> > >>> In our production environment, we occasionally encounter a problem >> > where a >> > >>> user submits an abnormal computation task, causing a sudden flood of >> > >>> requests, which causes the queueTime and processingTime of the >> > Namenode to >> > >>> rise very high, causing a large backlog of tasks. >> > >>> >> > >>> We usually locate and kill specific Spark, Flink, or MapReduce tasks >> > based >> > >>> on metrics and audit logs. Currently, IP and UGI are recorded in >> audit >> > >>> logs, but there is no port information, so it is difficult to locate >> > >>> specific processes sometimes. Therefore, I propose that we add the >> port >> > >>> information to the audit log, so that we can easily track the >> upstream >> > >>> process. >> > >>> >> > >>> Currently, some projects contain port information in audit logs, >> such >> > as >> > >>> Hbase and Alluxio. I think it is also necessary to add port >> information >> > >>> for >> > >>> HDFS audit logs. >> > >>> >> > >>> I submitted a PR(https://github.com/apache/hadoop/pull/3538), which >> > has >> > >>> been tested in our test environment, and both RPC and HTTP are in >> > effect. >> > >>> I >> > >>> look forward to your discussion on possible problems and suggestions >> > for >> > >>> modification. I will actively update the PR. >> > >>> >> > >>> Best Regards, >> > >>> Tom >> > >>> >> > >>> >> > > >> > >> >