I think we need to have a discussion about the HDFS audit log.
The purpose of the HDFS audit log* is for operations and security people to keep track of actual, bits-on-disk changes to HDFS and related metadata changes. It is not meant as a catch-all for any and all HDFS operations. It is most definitely processed by code written by people. It’s format is meant to be fixed; specifically no new fields and all fields should be present on every line. It’s meant to be extremely easy to parse for even junior admins. For the past year, I’ve noticed an extremely disturbing trend: a) Changes to the log file with BREAKS operations people. Part of the problem here is that the compatibility guidelines don’t specify that this file is locked. We should fix this. b) An increasing number of “we should log this random NN operation”. Unless it modifies the actual data, these are not AUDIT-worthy events. Ask yourself, “would a security person care?” If the answer is no, then don’t put it in the HDFS audit log and just keep an entry in the generic namenode log. If the answer is yes, get a second opinion from someone else, preferably outside your team who actually does security. * - if anyone wants the full history, feel free to ask …