anuengineer commented on PR #921:
URL: https://github.com/apache/ozone/pull/921#issuecomment-1845935162

Before someone assumes that life imitates art, I know Nicholas very well,
worked with him for quite a few years, and tremendously respect him.
It has always been a pleasure to have intellectual discussions with him,
and this discussion is in the same vein. If someone else had made this
comment, I would have either thought they were simply trolling, or
completely ignorant. Nicholas, I know does not fall into either of these
buckets.
My familiarity and the respect I have for him compels me to answer this in
detail,
and not treat this as a frivolous comment.
[Quote from
Nicholas:](https://github.com/apache/ozone/pull/921#issuecomment-1842287866)
> *“As mentioned, 45-75 line length is good for textual information but
> it definitely is not good for computer programs nowadays. It is
> nonsense to enforce min line length in computer programs. It is also
> nonsense to enforce 75 max line length. Therefore, the veto is
> invalid.”*
I am confident that Nicholas, like many of us, understands the principle
that extraordinary claims demand extraordinary evidence. I encourage
Nicholas to present any factual evidence supporting his claim.
Meanwhile, I’ll share what I know on the matter, to facilitate a better
discussion.
Throughout the history of this issue, which spans several years, a
recurring claim has been that “Natural Reading studies are not
applicable to code reading.” This is a notion that Nicholas appears to
echo in his recent statements.
To address this, let’s consider the question: “Do humans read code in a
manner akin to how they read natural English text?”, This study [Eye
Movements in Code
Reading](https://researchonline.gcu.ac.uk/ws/portalfiles/portal/24953094/ICPC2015_authors_version.pdf)
conducted by researchers from various universities including German,
Finnish, Hawaiian, Glasgow, Youngstown, and Microsoft Research, delves
into this very topic.
For those who may not have the time to read the entire paper, I’d like
to summarize a key finding.
The study poses a question
> **RQ1**: *Do people read code similarly to how they read natural
> text?*
The response, based on eye movement tracking, is **affirmative.**
Nicholas is a mathematician, so he is welcome to look deeply into the
paper criticize the correlation and method of study.
**The take away from this paper is that pattern in reading code closely
resemble those observed in reading natural language texts.** One
interesting this about this paper, is that most of the code used was
indeed Java, so it is fortunately relevant for us.
I will now rest the claim that text information is NOT related to code
and any such relation is **nonsense**. I look forward to a
well-researched rebuttal from Nicholas.
Now let me tackle the second part of the question from Nicholas:
> *On the other hands, short line length such as 80 does lead to many
> problems in computer programs – code readability, encouraging short
> variable names, unnecessary code formatting efforts, etc. All these
> problems were not considered in the cited textual information
> recommendation.*
I am presuming that Nicholas has not read a paper that I shared earlier
in the discussion of this issue. I strongly recommend that anyone
discussing this issue reads this paper [Fundamentals of Optimal Code
Style](https://optimal-codestyle.github.io/), I learned a lot from that
paper. It is rich with references and starts from the visual cortex, how
we read and how we specifically read code.
Here is a verbatim quote from the article, almost at the end:
> “These guidelines are reasonably consistent with the boundaries given
> in Steve McConnell’s book¹³: 10-16 and 8-20. Now we can somehow
> explain them.”
There is a deep analysis on this subject, and also a large body of code
that exists today (Apache Ozone for example) which has 80 lines limit.
I believe that we are able to communicate to another human being within
these boundaries is under discussion, and if we have a failure to
communicate, I think the onus to prove that there exists such failure –
that cannot be communicated in 80 chars – I humbly submit belongs to
Nicholas and not to me.
## Comments from others.
Since we are here, and I know others have asked questions slightly
different on this topic, and even though Nicholas is special; so are
all my other friends; I have had the privilege to work with all of you,
and I am thankful for the wonderful time I have spent with all of you.
So I am going to time to answer your questions also in great depth.
[comment from
sdonnel](https://github.com/apache/ozone/pull/921#issuecomment-1842489014):
*I also agree, the cited article is about written text, and reading
source code is not the same as reading natural language text. In general
natural language is scanned and read very quickly, so long lines make
this difficult. Code, generally gets scanned slowly as one has to take
time to understand it. The extra line breaks and indentations can make
it harder to follow.*
This is a great comment, and I agree. I started with exact same
presumption myself when I first reviewed this patch. There is a little
bit of history; When I first came across this patch, I had no idea about
any of this – but I was asked to code review on a topic, and I went
studying to make sure that I do justice to
I’d like to refer again to a study I previously mentioned, which might
interest Steven given his passion for the topic. [Fundamentals of
Optimal Code Style](https://optimal-codestyle.github.io/), discusses how
natural language text is typically processed in two phases: the textual
phase (how it is written) and the domain phase (what it means). In
contrast, comprehending source code introduces a third dimension:
execution.
> “Natural language text is typically understood in two concurrent
> phases: text (how it is written down) and domain (what it means).
> Source code comprehension however needs a third dimension of
> comprehension: execution.”
This additional dimension is the focus of the earlier paper I
referenced. It investigates whether people read code by tracing its
execution path or by following different semantics. Specifically, the
study examines reading patterns used to understand the [Denotational
Semantics](https://en.wikipedia.org/wiki/Denotational_semantics) of
code. Both the Optimal Code Style study and the eye tracking research
delve deeply into this topic. In summary, they suggest that we tend to
use a ‘story mode’—a linear reading pattern—rather than following the
execution path when reading code.
You might want the sections on Program Comprehension, the Top-Down
model, and the Bottom-Up model in the [Fundamentals of Optimal Code
Style](https://optimal-codestyle.github.io/) with an context of
Denotational Semantics. My overarching point is that existing research
seems to diverge from your viewpoint. As always, I am open to learning
and would appreciate any contrary references or insights you could
share.
I will now take time to respond to [vivek’s
question](https://github.com/apache/ozone/pull/921#issuecomment-1843283623):
First of all, I want to humbly differ from vivek’s position that we
should not look at any research papers. I think people learn
differently, and perhaps vivek learns from his personal experiences,
while I do learn a lot from other people (in fact this dialogue is part
of that learning process) and reading papers is way of learning from
other people for me. I am just going to say that I respect that way
Vivek learns, and I hope he will be accommodating of my learning styles.
Addressing your point, Vivek, the core issue here is that the shape and
structure of code significantly affect comprehension.
To specifically answer your question vivek, here is the core issue. The
shape of objects matter in comprehension.
> Studies of the relationship between ambient (global) and focal (local)
> visual processing began in the experiments of David Navon in 1977.
The auto folding of code, does NOT allow for that.
The automatic folding or soft wrapping of code in IDEs doesn’t
accommodate this aspect of visual processing. For instance, setting the
line length to 40 characters and relying on the editor to wrap lines can
result in inconsistent and potentially confusing experiences. To better
understand this, You might want to look at the shape studies [Shape of
the code 1](https://optimal-codestyle.github.io/en/cpp1_blur.jpg) and
[Shape of the Code2](https://optimal-codestyle.github.io/en/cpp2_blur.jpg)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]