Re: [PR] HDDS-3583. Loosen some rule restrictions of checkstyle [ozone]

via GitHub Fri, 08 Dec 2023 11:09:40 -0800


anuengineer commented on PR #921:
URL: https://github.com/apache/ozone/pull/921#issuecomment-1847685118


   Thank you @steveloughran. You have always brought a fresh breath to 
arguments, and you are taking this argument in the right direction. This is 
what I have been asking..
   
   > Hadoop stayed on 80 for a long time not so much for punched-card 
development (that's where it came from!) but because it was easier to do 
side-by-side reviews with the .diff viewer plugin for chrome.
   
   
   I have heard the similar story for a long time, and I have always believed 
this to be true; until I started reading the
   papers behind the cognition.  In the same spirit that you have offered to 
share your learning and understanding, I want to share what I have been 
learning.
   
   1.  Why do we keep reading most material in the world, even in the digital 
world -- ePub, PDF etc. in the 80 column  mark ?  The human minds have been 
trained for a long long time to keep it around that number. Moving it by 20 
chars -- I **personally** believe should NOT impact us by much. However, each 
time I read the general literature, I will share those links in a separate 
post, so it is easier for you to read, keeps on arguing that best comprehension 
for human minds are under 75 chars.
   2. Look at the new generation of Text based tools we have: ChatGPT, Bard 
etc. etc. They all seem to follow the same method of comprehension.
   3. The new Java: If you go by the comprehension arguments. The use of 
elements like var, allows us to write more efficiently in less verbose way. So 
lambda, vars, all the features of new family of Java would allow all of us 
express far more ideas in lesser space. This leads to the question, is it a 
good idea to extend the length of the line then ? Given, we might move to Java 
21 (Current LTS) at some point in time. There are some classic papers on 
Python's Comprehensions, and  how the concise expression effects understand and 
reading of code. Again, I will share each of these papers with you so you can 
clearly see that I am not just proposing my personal view; but I am saying this 
is the current state of the research on this topic/
   
   
   
   
   
   > 
   > we moved to 100 chars a few years back and I have no regrets. It helps us 
stay with readable code when using builder APIs:
   > 
   > ```java
   > public interface FutureDataInputStreamBuilder
   >     extends FSBuilder<CompletableFuture<FSDataInputStream>, 
FutureDataInputStreamBuilder> {
   > ```
   > 
   
   This is actually a great input. The question I would ask is that is there 
more density of information now per line, and what is the way that people read 
and understand.
   
   > Here the types of arguments/subclasses are over 80 chars wide, splitting 
this stuff makes it harder to parse.
   > 
   > And, just like the limits on method length and #of arguments to a method 
or constructor, it is only a hint. Which is good when you are writing methods 
which take builders from the AWS SDK as arguments:
   > 
   > ```java
   >   private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>, 
ClientT> void configureEndpointAndRegion(
   >       BuilderT builder, S3ClientCreationParameters parameters, 
Configuration conf) {
   > ```
   > 
   > Although wider lines may appear to go back on the "side by side" reviewing 
problem, it is considered moot because
   > 
   >     * displays have got wider
   > 
   >     * github is a better review tool which does word wrap
   > 
   >     * vs code and intellij are great for reviews too.
   > 
   > 
   This is a major concern. When I am on a laptop, I find hard to review. The 
language of choice for me these days is Rust, and they do use 100 chars. I am 
familiar with this. What I do on a regular basis is that is I use 2 monitors 
when I do code reviews. To avoid these kind of wrapping.
   
   > Adopting a wider column count should not be an excuse to overboard with 
functional programming code to show off how you learned Haskell, Standard ML or 
OCaml
   
   This. 100 times this. Java is rapidly getting these features, and I love 
that. Don't get me wrong. The information
   density of each character of Java is exploding in the last 5 / 6 releases. 
Given that, would it not be far better
   to actually move to better versions of java to address this issue -- Which 
we have to at some point in time anyway ?
   
   Why is that -- in my mind -- really valuable and a far better solution NOT 
being discussed here ? 
   
   
   ### Another Logical Corollary to this point.
   This argument always seems to start from a point, ** I have larger 
monitor**, followed by the ** I don't believe human beings reading studies are 
relevant in understanding code**.  So let us make this 100 chars as the limit
   
   What surprises me is that they are NOT able to see the inconsistency of the 
ask, let me illustrate that here.
   
   Let us assume that every one in the world has a 1080p screen and 21 inches 
is the minimum size of a monitor (I am fully aware that many of you might be 
coding on Macs with 4K), but humor me here.
   
   **The first question we want to ask is: How many chars can we effectively 
put on a given line ?**
   
   Pixels Per Inch (or PPI) is defined as number of pixels per inch on your 
monitor.
   There are [220 PPI] 
(https://www.calculatorsoup.com/calculators/technology/ppi-calculator.php) and 
if you assume a character size  10 pixels wide
   
   Characters per Line= Screen Width in Inches × (PPI / Character Width in 
Pixels)
   Characters per Line = 21× (220 / 10) = 462 characters per line.
   
   So the first question in my mind is always, why are you folks NOT asking for 
462 characters per line.
   
   The argument that I have a Large Screen implies I need more characters seems 
to fail, due to this inconsistency in the ask.
   I realize that the people who are making this ask are *exceptionally* gifted 
programmers (I am looking at you @steveloughran) and in no way incapable of 
understanding this logical issue.
   
   So what they are really asking me is this -- Hi, I am getting irritated with 
Java 8 kind of code, and then checkstyle ( which I never run locally on my 
machine, fails in the Github run), make it fail less. But my humble point is 
that you are just responding to your anger, and not realizing the greater good 
this rule was designed for. That is to prevent information overload.
   
   We have studied extensively the **"Saccades" : One of the important metric 
used in understanding how people read code and understand code -- is measured 
by a notion called "Saccades" or "Saccades Per Line" or how we read to 
understand. The paper indicates that experts make larger jumps (saccades) when 
reading code, focusing on important elements, while Novices tend to read code 
more linearly.  This implies that fewer, but more targeted, saccades per line 
might be characteristic of expert reading behavior, and Novices read very 
differently.
   
   As a language gains more capability to express ideas in far fewer 
characters, we have to be cognizant of that. I know that @arp7  used to love 
Perl. He used to be able to write it well. Often, I struggled to read Perl due 
to the information density shenanigans involved by the master of Perl ( for all 
the new kids, this is a comment that applies to few older people in this 
discourse, it is a waste of your time to look up the Magic of Perl.)
   
   This leads me to conclude, that real argument that my friends are trying to 
make is NOT about the monitor size -- by that checkstyle fails when we are 
about to commit.
   
   Perhaps, that is also the reason that Nicholas wants to argue that I have 
not been checking in and I don't suffer the same pain (which is NOT true, I 
have been more irritated by these tools that anything. Ask @awk -- I have had 
many fights with him. ) 
   
   So let us bring this conversation back in a normal ? what is better ? Moving 
the checkstyle to 100 chars -- which I believe is a patch to reduce the pain 
for a little while, but suffer the consequences of  the fear that you are 
voicing __"overboard with functional programming code to show off how you 
learned Haskell, Standard ML or OCaml"__. I guarantee you that any Java 
programmer who starts coding from Java 17 has far more information density that 
many of us who started far earlier. 
   
   In that case, is the solution truly to take the pain for a little while and 
get Ozone working with Java 21 LTS or Java 17 LTS ? 
   
   
   
   > ```java
   > getOptionalKeys().stream().filter(!key -> key.startsWith(headerPrefix) && 
key.length() > prefixLen).forEach(key -> headers.put(key.substring(prefixLen), 
options.get(key)));
   > ```
   > 
   
   I strongly feel that this complicated. Especially due to the reactive crowd 
of Java, where they taught everyone it is so cool to return this in each 
function call, and then chain the calls. Most people I know, would have broken 
this line into multiple lines, and perhaps (I think, a personal opinion) that 
it is better to read this as different lines.
   
   
   > why so? makes debugging and tracking down where an NPE came from wrong. A 
single clause per line is much better as you can set breakpoints, isolate stack 
traces etc.
   > 
   > ```java
   >     getOptionalKeys().stream()
   >         .filter(key -> key.startsWith(headerPrefix) && key.length() > 
prefixLen)
   >         .forEach(key -> headers.put(key.substring(prefixLen), 
options.get(key)));
   > ```
   > 
   
   
   Completely agree. For debugging a single long line is better. But we both 
know that the real issue
   is NOT about the Monitors capabilities or the line length. It is that 
checkstyle is a Tax we pay for;
   It is a Tax we inherited from Hadoop. But I sincerely believe as an open 
source which is 
   designed to attract new contributors, it is a tax worth paying because it 
**limits** the information
   density you can pack into a line. I know a tool cannot be a replacement to 
diligent review by
   the programmers, however, this tool would flag and catch issues, and help 
the reviewer.
   
   > Accordingly, I'd worry less about line length and more about modern java 
code issues
   > 
   >     * best practices for lambda expressions
   > 
   >     * effective use of optional<>, java17 type inference, ...
   > 
   >     * how to design for shading, e.g. making sure you never return _any_ 
type from someone else's library in your APIs in case you want to move to 
shaded libraries, change the implementation etc.
   > 
   >     * how to design for maintenance. We've been maintaining hadoop code 
for 15+ years. What could we have done better?
   > 
   >     * design for testability.
   > 
   >     * should we embrace runtime exceptions so we can use java streaming 
APIs, or do we have to reimplement our own functional libraries. It's too late 
to change old code, hence org.apache.hadoop.util.functional, but new APIs?
   >       For the last 2 I've really embraced interface/implementation rather 
than things which require mockito abuse for changing behaviour. Hadoop avoided 
interfaces in the past because its impossible to add new methods without 
breaking external code. but with java8 and the ability to define default 
implementations, that changed.
   > 
   
   100%  -- This is my opinion too. However the discussion for the last 3 years 
have always been on "My monitor is big", I don't believe that human being read 
code. If you address these comments about -- The var, optional etc. you have 
effectively solved this issue. 
   
   **But** here is the issue. Are all over contributors willing to move up this 
chain; perhaps tweak the way they write code ? 
   
   
   > 
   > So here's my proposal
   > 
   >     1. stop worrying about line length. Change the rule, but even if not, 
+1 patches which break it if it makes the code clearer
   > 
   
   +1. I am fine with it. I believe that committers should do the right thing. 
The tools are to help them, NOT to dictate.
   
   >     2. come and discuss on hadoop-common how we could do a style guide for 
the new world, based on others' work
   > 
   
   I would, but as you see, I am very engaged in defending MY ONLY veto in 
Apache. {Just kidding, I am not listening to Hadoop-common, I have always been 
an Ozone man.}
   
   >     3. look at the style rules of apache Scala projects, so we can learn 
from them especially on type inference (all args/return values must declare 
types)
   > 
   Yes. This is what I am saying too.
   
   
   >     4. embrace AssertJ if you haven't already.
   > 
   > 
   > if you can't get consensus, move this out of github issues into mailing 
lists. And focus on what is best for the readability and maintainability of 
modern code, not stuff we wrote for java 5.
   > 
   > (BTW: all those code snippets are from hadoop trunk, not all my work)
   
   @steveloughran  It has been a pleasure to talk to you after a long time. I 
am thankful for your thoughtful comments, and more over for taking time to 
write a thoughtful response.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-3583. Loosen some rule restrictions of checkstyle [ozone]

Reply via email to