Hi Lars & Welcome!

Maybe the first step here would be look at those style guides and think how to 
bring them up to date, especially with stuff like lambda-expressions in java 8, 
and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 -> 5 testing, 
code instrumentation, diagnostics, log stability, etc.

https://issues.apache.org/jira/browse/HADOOP-12143 . ;

This is my go at doing this

https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md


I've not done any work on trying to get it in, more evolving it as how I code & 
what I look for, especially in tests.

If you want to take this on, it'd be nice. At the same time, I fear there'd be 
push back if you turned up and started telling people what to do. Collaborating 
with us all on the test code is a good place to start.

We're also more relaxed about contributions to the less-core bits of the system 
(things like HDFS, IPC, security and Yarn core are trouble). If there's stuff 
outside that you want to take a go at helping clean up, that'd be lower risk 
(example: object store connectors)

-Steve



On 7 Aug 2017, at 13:13, Lars Francke 
<lars.fran...@gmail.com<mailto:lars.fran...@gmail.com>> wrote:

Hi,

a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
the past am a Hive committer and have used Hadoop for 10 years now, so I'm
not totally inexperienced. I'm earning my money as a Hadoop consultant so
I've seen dozens of real-life clusters in my life.

As part of a few recent client projects and now writing about Hadoop in a
new project/book I'm digging into the source code to figure out some of the
things that are not documented.

But as part of this digging I'm seeing lots of warnings in the code,
inconsistencies etc. and I'd like to contribute some fixes to this back to
the community.

I have been a long-time believer in good code quality and consistent code
styles. This might affect people like me especially who do a lot of
"drive-by" contributions as I'm not someone who looks at the code daily but
comes across it reasonably often as part of client engagements. In those
scenarios, it's very unhelpful to have inconsistent code & bad
documentation.

Two simple but concrete examples:
* There's lots of "final" usages on variables and methods but no
consistency. Was this done for particular reasons or personal preference?

personal, though with a move to l-expressions, it matters a lot more. We should 
really be marking all parameters as final at the very least.


* Similarly, there's lots of things that are public or protected while they
could in theory be private. This especially makes it very hard to reason
about code.

there's now a bit of fear of breaking things, but at the very least, things 
could be protected or package-private more than they are.



Judging from the current code there's lots of "unofficial" code styling
and/or personal preference. The Wiki says[1] to follow the Sun
guidelines[2] which have not been updated in almost 20 years. A new version
is in the works an clarifies a lot of things[3]. I'm trying to get it
published soon. I'd try to format according to the latter (that means among
other things no "final" for local variables).

I realize that I won't be able to single-handedly fix all of this
especially as code gets contributed but if the community thinks it's
worthwhile I'd still love to land a few cleanup patches. My experience in
the past has been that it's hard to get attention to these things (which I
fully understand as they take up someone's time to review & commit).

So, this is my request for comments on these questions:
* Is there any interest in this at all?
** "This" being patches for code style & things like FindBugs & Checkstyle
warnings
* Size of the patches: Rather one big patch or smaller ones (e.g. per file
or package)
* Anyone willing to help me with this? e.g. reviewing and committing? I'd
be more than happy to bribe you with drinks, sweets, food or something else

My plan is not to go through each and every file and fix every issue I see.
But there are some specific areas I'm looking at in detail and there I'd
love to contribute back.

Thank you for reading!

Cheers,
Lars

PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
and yarn-dev as well?

[1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
[2] <
http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html

[3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
[4] <
https://issues.apache.org/jira/issues/?filter=-1&jql=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke


Reply via email to