anuengineer commented on PR #921:
URL: https://github.com/apache/ozone/pull/921#issuecomment-1847685118
Thank you @steveloughran. You have always brought a fresh breath to
arguments, and you are taking this argument in the right direction. This is
what I have been asking..
> Hadoop stayed on 80 for a long time not so much for punched-card
development (that's where it came from!) but because it was easier to do
side-by-side reviews with the .diff viewer plugin for chrome.
I have heard the similar story for a long time, and I have always believed
this to be true; until I started reading the
papers behind the cognition. In the same spirit that you have offered to
share your learning and understanding, I want to share what I have been
learning.
1. Why do we keep reading most material in the world, even in the digital
world -- ePub, PDF etc. in the 80 column mark ? The human minds have been
trained for a long long time to keep it around that number. Moving it by 20
chars -- I **personally** believe should NOT impact us by much. However, each
time I read the general literature, I will share those links in a separate
post, so it is easier for you to read, keeps on arguing that best comprehension
for human minds are under 75 chars.
2. Look at the new generation of Text based tools we have: ChatGPT, Bard
etc. etc. They all seem to follow the same method of comprehension.
3. The new Java: If you go by the comprehension arguments. The use of
elements like var, allows us to write more efficiently in less verbose way. So
lambda, vars, all the features of new family of Java would allow all of us
express far more ideas in lesser space. This leads to the question, is it a
good idea to extend the length of the line then ? Given, we might move to Java
21 (Current LTS) at some point in time. There are some classic papers on
Python's Comprehensions, and how the concise expression effects understand and
reading of code. Again, I will share each of these papers with you so you can
clearly see that I am not just proposing my personal view; but I am saying this
is the current state of the research on this topic/
>
> we moved to 100 chars a few years back and I have no regrets. It helps us
stay with readable code when using builder APIs:
>
> ```java
> public interface FutureDataInputStreamBuilder
> extends FSBuilder<CompletableFuture<FSDataInputStream>,
FutureDataInputStreamBuilder> {
> ```
>
This is actually a great input. The question I would ask is that is there
more density of information now per line, and what is the way that people read
and understand.
> Here the types of arguments/subclasses are over 80 chars wide, splitting
this stuff makes it harder to parse.
>
> And, just like the limits on method length and #of arguments to a method
or constructor, it is only a hint. Which is good when you are writing methods
which take builders from the AWS SDK as arguments:
>
> ```java
> private <BuilderT extends S3BaseClientBuilder<BuilderT, ClientT>,
ClientT> void configureEndpointAndRegion(
> BuilderT builder, S3ClientCreationParameters parameters,
Configuration conf) {
> ```
>
> Although wider lines may appear to go back on the "side by side" reviewing
problem, it is considered moot because
>
> * displays have got wider
>
> * github is a better review tool which does word wrap
>
> * vs code and intellij are great for reviews too.
>
>
This is a major concern. When I am on a laptop, I find hard to review. The
language of choice for me these days is Rust, and they do use 100 chars. I am
familiar with this. What I do on a regular basis is that is I use 2 monitors
when I do code reviews. To avoid these kind of wrapping.
> Adopting a wider column count should not be an excuse to overboard with
functional programming code to show off how you learned Haskell, Standard ML or
OCaml
This. 100 times this. Java is rapidly getting these features, and I love
that. Don't get me wrong. The information
density of each character of Java is exploding in the last 5 / 6 releases.
Given that, would it not be far better
to actually move to better versions of java to address this issue -- Which
we have to at some point in time anyway ?
Why is that -- in my mind -- really valuable and a far better solution NOT
being discussed here ?
### Another Logical Corollary to this point.
This argument always seems to start from a point, ** I have larger
monitor**, followed by the ** I don't believe human beings reading studies are
relevant in understanding code**. So let us make this 100 chars as the limit
What surprises me is that they are NOT able to see the inconsistency of the
ask, let me illustrate that here.
Let us assume that every one in the world has a 1080p screen and 21 inches
is the minimum size of a monitor (I am fully aware that many of you might be
coding on Macs with 4K), but humor me here.
**The first question we want to ask is: How many chars can we effectively
put on a given line ?**
Pixels Per Inch (or PPI) is defined as number of pixels per inch on your
monitor.
There are [220 PPI]
(https://www.calculatorsoup.com/calculators/technology/ppi-calculator.php) and
if you assume a character size 10 pixels wide
Characters per Line= Screen Width in Inches × (PPI / Character Width in
Pixels)
Characters per Line = 21× (220 / 10) = 462 characters per line.
So the first question in my mind is always, why are you folks NOT asking for
462 characters per line.
The argument that I have a Large Screen implies I need more characters seems
to fail, due to this inconsistency in the ask.
I realize that the people who are making this ask are *exceptionally* gifted
programmers (I am looking at you @steveloughran) and in no way incapable of
understanding this logical issue.
So what they are really asking me is this -- Hi, I am getting irritated with
Java 8 kind of code, and then checkstyle ( which I never run locally on my
machine, fails in the Github run), make it fail less. But my humble point is
that you are just responding to your anger, and not realizing the greater good
this rule was designed for. That is to prevent information overload.
We have studied extensively the **"Saccades" : One of the important metric
used in understanding how people read code and understand code -- is measured
by a notion called "Saccades" or "Saccades Per Line" or how we read to
understand. The paper indicates that experts make larger jumps (saccades) when
reading code, focusing on important elements, while Novices tend to read code
more linearly. This implies that fewer, but more targeted, saccades per line
might be characteristic of expert reading behavior, and Novices read very
differently.
As a language gains more capability to express ideas in far fewer
characters, we have to be cognizant of that. I know that @arp7 used to love
Perl. He used to be able to write it well. Often, I struggled to read Perl due
to the information density shenanigans involved by the master of Perl ( for all
the new kids, this is a comment that applies to few older people in this
discourse, it is a waste of your time to look up the Magic of Perl.)
This leads me to conclude, that real argument that my friends are trying to
make is NOT about the monitor size -- by that checkstyle fails when we are
about to commit.
Perhaps, that is also the reason that Nicholas wants to argue that I have
not been checking in and I don't suffer the same pain (which is NOT true, I
have been more irritated by these tools that anything. Ask @awk -- I have had
many fights with him. )
So let us bring this conversation back in a normal ? what is better ? Moving
the checkstyle to 100 chars -- which I believe is a patch to reduce the pain
for a little while, but suffer the consequences of the fear that you are
voicing __"overboard with functional programming code to show off how you
learned Haskell, Standard ML or OCaml"__. I guarantee you that any Java
programmer who starts coding from Java 17 has far more information density that
many of us who started far earlier.
In that case, is the solution truly to take the pain for a little while and
get Ozone working with Java 21 LTS or Java 17 LTS ?
> ```java
> getOptionalKeys().stream().filter(!key -> key.startsWith(headerPrefix) &&
key.length() > prefixLen).forEach(key -> headers.put(key.substring(prefixLen),
options.get(key)));
> ```
>
I strongly feel that this complicated. Especially due to the reactive crowd
of Java, where they taught everyone it is so cool to return this in each
function call, and then chain the calls. Most people I know, would have broken
this line into multiple lines, and perhaps (I think, a personal opinion) that
it is better to read this as different lines.
> why so? makes debugging and tracking down where an NPE came from wrong. A
single clause per line is much better as you can set breakpoints, isolate stack
traces etc.
>
> ```java
> getOptionalKeys().stream()
> .filter(key -> key.startsWith(headerPrefix) && key.length() >
prefixLen)
> .forEach(key -> headers.put(key.substring(prefixLen),
options.get(key)));
> ```
>
Completely agree. For debugging a single long line is better. But we both
know that the real issue
is NOT about the Monitors capabilities or the line length. It is that
checkstyle is a Tax we pay for;
It is a Tax we inherited from Hadoop. But I sincerely believe as an open
source which is
designed to attract new contributors, it is a tax worth paying because it
**limits** the information
density you can pack into a line. I know a tool cannot be a replacement to
diligent review by
the programmers, however, this tool would flag and catch issues, and help
the reviewer.
> Accordingly, I'd worry less about line length and more about modern java
code issues
>
> * best practices for lambda expressions
>
> * effective use of optional<>, java17 type inference, ...
>
> * how to design for shading, e.g. making sure you never return _any_
type from someone else's library in your APIs in case you want to move to
shaded libraries, change the implementation etc.
>
> * how to design for maintenance. We've been maintaining hadoop code
for 15+ years. What could we have done better?
>
> * design for testability.
>
> * should we embrace runtime exceptions so we can use java streaming
APIs, or do we have to reimplement our own functional libraries. It's too late
to change old code, hence org.apache.hadoop.util.functional, but new APIs?
> For the last 2 I've really embraced interface/implementation rather
than things which require mockito abuse for changing behaviour. Hadoop avoided
interfaces in the past because its impossible to add new methods without
breaking external code. but with java8 and the ability to define default
implementations, that changed.
>
100% -- This is my opinion too. However the discussion for the last 3 years
have always been on "My monitor is big", I don't believe that human being read
code. If you address these comments about -- The var, optional etc. you have
effectively solved this issue.
**But** here is the issue. Are all over contributors willing to move up this
chain; perhaps tweak the way they write code ?
>
> So here's my proposal
>
> 1. stop worrying about line length. Change the rule, but even if not,
+1 patches which break it if it makes the code clearer
>
+1. I am fine with it. I believe that committers should do the right thing.
The tools are to help them, NOT to dictate.
> 2. come and discuss on hadoop-common how we could do a style guide for
the new world, based on others' work
>
I would, but as you see, I am very engaged in defending MY ONLY veto in
Apache. {Just kidding, I am not listening to Hadoop-common, I have always been
an Ozone man.}
> 3. look at the style rules of apache Scala projects, so we can learn
from them especially on type inference (all args/return values must declare
types)
>
Yes. This is what I am saying too.
> 4. embrace AssertJ if you haven't already.
>
>
> if you can't get consensus, move this out of github issues into mailing
lists. And focus on what is best for the readability and maintainability of
modern code, not stuff we wrote for java 5.
>
> (BTW: all those code snippets are from hadoop trunk, not all my work)
@steveloughran It has been a pleasure to talk to you after a long time. I
am thankful for your thoughtful comments, and more over for taking time to
write a thoughtful response.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]