Hi Allen, thanks for bringing this up. Two comments:

- Is there a specification for the audit log format? i.e. is it something
structured like JSON? I think I asked you this in-person, and you said it's
something custom. I doubt that we can just "freeze" the format. In recent
times we've added things like ACLs and xattrs which do affect security. So,
we need the ability to add new messages and fields at times.
- Since AFAIK there aren't any contract tests for the format or user
expectations, there's no way to know when something breaks. This is the
normal issue where out-of-tree code gets broken every release.

I think addressing these two things, along with adding a big honking
comment to AuditLogger / HdfsAuditLogger, would improve the story.

As a non-security person, I'd also personally appreciate some guidance as
to what should *not* go in the audit log. You mention that we should only
log write operations, but I know we log some reads, and that there are
people who (perhaps incorrectly) use the audit log to count these read ops.

Best,
Andrew

On Sat, Apr 25, 2015 at 7:58 AM, Allen Wittenauer <a...@altiscale.com> wrote:

>
>         I think we need to have a discussion about the HDFS audit log.
>
>         The purpose of the HDFS audit log* is for operations and security
> people to keep track of actual, bits-on-disk changes to HDFS and related
> metadata changes. It is not meant as a catch-all for any and all HDFS
> operations.  It is most definitely processed by code written by people.
> It’s format is meant to be fixed; specifically no new fields and all fields
> should be present on every line. It’s meant to be extremely easy to parse
> for even junior admins.
>
>         For the past year, I’ve noticed an extremely disturbing trend:
>
>                 a) Changes to the log file with BREAKS operations people.
> Part of the problem here is that the compatibility guidelines don’t specify
> that this file is locked.  We should fix this.
>
>                 b) An increasing number of “we should log this random NN
> operation”.  Unless it modifies the actual data, these are not AUDIT-worthy
> events.  Ask yourself, “would a security person care?”  If the answer is
> no, then don’t put it in the HDFS audit log and just keep an entry in the
> generic namenode log.  If the answer is yes, get a second opinion from
> someone else, preferably outside your team who actually does security.
>
>
> * - if anyone wants the full history, feel free to ask …

Reply via email to