Andrew, thanks for adding your perspective on this.

What is a realistic strategy for us to evolve the HDFS audit log in a 
backward-compatible way?  If the API is essentially any form of ad-hoc 
scripting, then for any proposed audit log format change, I can find a reason 
to veto it on grounds of backward incompatibility.

- I can’t add a new field on the end, because that would break an awk script 
that uses $NF expecting to find a specific field.
- I can’t prepend a new field, because that would break a "cut -f1" expecting 
to find the timestamp.
- HDFS can’t add any new features, because someone might have written a script 
that does "exit 1" if it finds an unexpected RPC in the "cmd=" field.
- Hadoop is not allowed to add full IPv6 support, because someone might have 
written a script that looks at the "ip=" field and parses it by IPv4 syntax.

On the CLI, a potential solution for evolving the output is to preserve the old 
format by default and only enable the new format if the user explicitly passes 
a new argument.  What should we do for the audit log?  Configuration flags in 
hdfs-site.xml?  (That of course adds its own brand of complexity.)

I’m particularly interested to hear potential solutions from people like Andrew 
and Allen who have been most vocal about the need for a stable format.  Without 
a solution, this unfortunately devolves into the format being frozen within a 
major release line.

We could benefit from getting a patch on the compatibility doc that addresses 
the HDFS audit log specifically. 

--Chris Nauroth

On 8/18/16, 8:47 AM, "Andrew Purtell" <andrew.purt...@gmail.com> wrote:

    An incompatible APIs change is developer unfriendly. An incompatible 
behavioral change is operator unfriendly. Historically, one dimension of 
incompatibility has had a lot more mindshare than the other. It's great that 
this might be changing for the better. 
    
    Where I work when we move from one Hadoop 2.x minor to another we always 
spend time updating our deployment plans, alerting, log scraping, and related 
things due to changes. Some are debatable as if qualifying for the 
'incompatible' designation. I think the audit logging change that triggered 
this discussion is a good example of one that does. If you want to audit HDFS 
actions those log emissions are your API. (Inotify doesn't offer access control 
events.) One has to code regular expressions for parsing them and reverse 
engineer under what circumstances an audit line is emitted so you can make 
assumptions about what transpired. Change either and you might break someone's 
automation for meeting industry or legal compliance obligations. Not a trivial 
matter. If you don't operate Hadoop in production you might not realize the 
implications of such a change. Glad to see Hadoop has community diversity to 
recognize it in some cases. 
    
    > On Aug 18, 2016, at 6:57 AM, Junping Du <j...@hortonworks.com> wrote:
    > 
    > I think Allen's previous comments are very misleading. 
    > In my understanding, only incompatible API (RPC, CLIs, WebService, etc.) 
shouldn't land on branch-2, but other incompatible behaviors (logs, audit-log, 
daemon's restart, etc.) should get flexible for landing. Otherwise, how could 
52 issues ( https://s.apache.org/xJk5) marked with incompatible-changes could 
get landed on branch-2 after 2.2.0 release? Most of them are already released. 
    > 
    > Thanks,
    > 
    > Junping
    > ________________________________________
    > From: Vinod Kumar Vavilapalli <vino...@apache.org>
    > Sent: Wednesday, August 17, 2016 9:29 PM
    > To: Allen Wittenauer
    > Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
    > Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1
    > 
    > I always look at CHANGES.txt entries for incompatible-changes and this 
JIRA obviously wasn’t there.
    > 
    > Anyways, this shouldn’t be in any of branch-2.* as committers there 
clearly mentioned that this is an incompatible change.
    > 
    > I am reverting the patch from branch-2* .
    > 
    > Thanks
    > +Vinod
    > 
    >> On Aug 16, 2016, at 9:29 PM, Allen Wittenauer 
<a...@effectivemachines.com> wrote:
    >> 
    >> 
    >> 
    >> -1
    >> 
    >> HDFS-9395 is an incompatible change:
    >> 
    >> a) Why is not marked as such in the changes file?
    >> b) Why is an incompatible change in a micro release, much less a minor?
    >> c) Where is the release note for this change?
    >> 
    >> 
    >>> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli 
<vino...@apache.org> wrote:
    >>> 
    >>> Hi all,
    >>> 
    >>> I've created a release candidate RC1 for Apache Hadoop 2.7.3.
    >>> 
    >>> As discussed before, this is the next maintenance release to follow up 
2.7.2.
    >>> 
    >>> The RC is available for validation at: 
http://home.apache.org/~vinodkv/hadoop-2.7.3-RC1/ 
<http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
    >>> 
    >>> The RC tag in git is: release-2.7.3-RC1
    >>> 
    >>> The maven artifacts are available via repository.apache.org 
<http://repository.apache.org/> at 
https://repository.apache.org/content/repositories/orgapachehadoop-1045/ 
<https://repository.apache.org/content/repositories/orgapachehadoop-1045/>
    >>> 
    >>> The release-notes are inside the tar-balls at location 
hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I hosted 
this at home.apache.org/~vinodkv/hadoop-2.7.3-RC1/releasenotes.html 
<http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for your 
quick perusal.
    >>> 
    >>> As you may have noted,
    >>> - few issues with RC0 forced a RC1 [1]
    >>> - a very long fix-cycle for the License & Notice issues (HADOOP-12893) 
caused 2.7.3 (along with every other Hadoop release) to slip by quite a bit. 
This release's related discussion thread is linked below: [2].
    >>> 
    >>> Please try the release and vote; the vote will run for the usual 5 days.
    >>> 
    >>> Thanks,
    >>> Vinod
    >>> 
    >>> [1] [VOTE] Release Apache Hadoop 2.7.3 RC0: 
https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106 
<https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106>
    >>> [2]: 2.7.3 release plan: 
https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
<http://markmail.org/thread/6yv2fyrs4jlepmmr>
    >> 
    >> 
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
    >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
    > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
    > 
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
    For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
    
    
    

Reply via email to