Lars- Welcome!
As a mild refinement of enthusiasm for this proposal: when you approach a "cleanup", please consider the cost to tracing the lineage of changes in the codebase. Working on a project as large and long-running as Hadoop, we often need to trace what motivated a particular change using only the commit log and JIRA. Sifting through cosmetic changes that obscure the reasoning behind a module not worth the aesthetic benefits of consistently formatted code. As a strawman: hitting 100% checkstyle compliance would not improve our users' experience, so please use your judgement. As you point out, we're not going to maintain perfect discipline going forward, either. Nitpicking our contributors beyond what is necessary to keep the code legible discourages them from continuing as contributors. As a general heuristic: the stricter the rule, the more automation is required to enforce it. This prevents everyone from burning out on minutiae. All that said, if you propose a refactoring that makes it easier to maintain code that's developed more vestigial parts that functional ones (and we have more than a few of those), that is hugely valuable. -C On Mon, Aug 7, 2017 at 5:13 AM, Lars Francke <lars.fran...@gmail.com> wrote: > Hi, > > a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in > the past am a Hive committer and have used Hadoop for 10 years now, so I'm > not totally inexperienced. I'm earning my money as a Hadoop consultant so > I've seen dozens of real-life clusters in my life. > > As part of a few recent client projects and now writing about Hadoop in a > new project/book I'm digging into the source code to figure out some of the > things that are not documented. > > But as part of this digging I'm seeing lots of warnings in the code, > inconsistencies etc. and I'd like to contribute some fixes to this back to > the community. > > I have been a long-time believer in good code quality and consistent code > styles. This might affect people like me especially who do a lot of > "drive-by" contributions as I'm not someone who looks at the code daily but > comes across it reasonably often as part of client engagements. In those > scenarios, it's very unhelpful to have inconsistent code & bad > documentation. > > Two simple but concrete examples: > * There's lots of "final" usages on variables and methods but no > consistency. Was this done for particular reasons or personal preference? > * Similarly, there's lots of things that are public or protected while they > could in theory be private. This especially makes it very hard to reason > about code. > > Judging from the current code there's lots of "unofficial" code styling > and/or personal preference. The Wiki says[1] to follow the Sun > guidelines[2] which have not been updated in almost 20 years. A new version > is in the works an clarifies a lot of things[3]. I'm trying to get it > published soon. I'd try to format according to the latter (that means among > other things no "final" for local variables). > > I realize that I won't be able to single-handedly fix all of this > especially as code gets contributed but if the community thinks it's > worthwhile I'd still love to land a few cleanup patches. My experience in > the past has been that it's hard to get attention to these things (which I > fully understand as they take up someone's time to review & commit). > > So, this is my request for comments on these questions: > * Is there any interest in this at all? > ** "This" being patches for code style & things like FindBugs & Checkstyle > warnings > * Size of the patches: Rather one big patch or smaller ones (e.g. per file > or package) > * Anyone willing to help me with this? e.g. reviewing and committing? I'd > be more than happy to bribe you with drinks, sweets, food or something else > > My plan is not to go through each and every file and fix every issue I see. > But there are some specific areas I'm looking at in detail and there I'd > love to contribute back. > > Thank you for reading! > > Cheers, > Lars > > PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev > and yarn-dev as well? > > [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist> > [2] < > http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html >> > [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html> > [4] < > https://issues.apache.org/jira/issues/?filter=-1&jql=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke >> --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org