I think we need to have a discussion about the HDFS audit log.

        The purpose of the HDFS audit log* is for operations and security 
people to keep track of actual, bits-on-disk changes to HDFS and related 
metadata changes. It is not meant as a catch-all for any and all HDFS 
operations.  It is most definitely processed by code written by people.  It’s 
format is meant to be fixed; specifically no new fields and all fields should 
be present on every line. It’s meant to be extremely easy to parse for even 
junior admins.

        For the past year, I’ve noticed an extremely disturbing trend:

                a) Changes to the log file with BREAKS operations people.  Part 
of the problem here is that the compatibility guidelines don’t specify that 
this file is locked.  We should fix this.

                b) An increasing number of “we should log this random NN 
operation”.  Unless it modifies the actual data, these are not AUDIT-worthy 
events.  Ask yourself, “would a security person care?”  If the answer is no, 
then don’t put it in the HDFS audit log and just keep an entry in the generic 
namenode log.  If the answer is yes, get a second opinion from someone else, 
preferably outside your team who actually does security.


* - if anyone wants the full history, feel free to ask …

Reply via email to