Re: [PR] HDDS-3583. Loosen some rule restrictions of checkstyle [ozone]

via GitHub Thu, 07 Dec 2023 10:59:06 -0800


anuengineer commented on PR #921:
URL: https://github.com/apache/ozone/pull/921#issuecomment-1845935162


    ![Duty Calls](https://imgs.xkcd.com/comics/duty_calls.png)
   
   Before someone assumes that life imitates art, I know Nicholas very well, 
   worked with him for quite a few years, and tremendously respect him. 
   It has always been a pleasure to have intellectual discussions with him, 
   and this discussion is in the same vein. If someone else had made this 
   comment, I would have either thought they were simply trolling, or 
   completely ignorant. Nicholas, I know does not fall into either of these 
buckets. 
   My familiarity and the respect I have for him compels me to answer this in 
detail, 
   and not treat this as a frivolous comment.
   
   [Quote from 
Nicholas:](https://github.com/apache/ozone/pull/921#issuecomment-1842287866)
   
   > *“As mentioned, 45-75 line length is good for textual information but
   > it definitely is not good for computer programs nowadays. It is
   > nonsense to enforce min line length in computer programs. It is also
   > nonsense to enforce 75 max line length. Therefore, the veto is
   > invalid.”*
   
   I am confident that Nicholas, like many of us, understands the principle
   that extraordinary claims demand extraordinary evidence. I encourage
   Nicholas to present any factual evidence supporting his claim.
   Meanwhile, I’ll share what I know on the matter, to facilitate a better
   discussion.
   
   Throughout the history of this issue, which spans several years, a
   recurring claim has been that “Natural Reading studies are not
   applicable to code reading.” This is a notion that Nicholas appears to
   echo in his recent statements.
   
   To address this, let’s consider the question: “Do humans read code in a
   manner akin to how they read natural English text?”, This study [Eye 
Movements in Code 
Reading](https://researchonline.gcu.ac.uk/ws/portalfiles/portal/24953094/ICPC2015_authors_version.pdf)
   conducted by researchers from various universities including German,
   Finnish, Hawaiian, Glasgow, Youngstown, and Microsoft Research, delves
   into this very topic.
   
   For those who may not have the time to read the entire paper, I’d like
   to summarize a key finding.
   
   The study poses a question
   
   > **RQ1**: *Do people read code similarly to how they read natural
   > text?*
   
   The response, based on eye movement tracking, is **affirmative.**
   Nicholas is a mathematician, so he is welcome to look deeply into the
   paper criticize the correlation and method of study.
   
   **The take away from this paper is that pattern in reading code closely
   resemble those observed in reading natural language texts.** One
   interesting this about this paper, is that most of the code used was
   indeed Java, so it is fortunately relevant for us.
   
   I will now rest the claim that text information is NOT related to code
   and any such relation is **nonsense**. I look forward to a
   well-researched rebuttal from Nicholas.
   
   Now let me tackle the second part of the question from Nicholas:
   
   > *On the other hands, short line length such as 80 does lead to many
   > problems in computer programs – code readability, encouraging short
   > variable names, unnecessary code formatting efforts, etc. All these
   > problems were not considered in the cited textual information
   > recommendation.*
   
   I am presuming that Nicholas has not read a paper that I shared earlier
   in the discussion of this issue. I strongly recommend that anyone
   discussing this issue reads this paper [Fundamentals of Optimal Code
   Style](https://optimal-codestyle.github.io/), I learned a lot from that
   paper. It is rich with references and starts from the visual cortex, how
   we read and how we specifically read code.
   
   Here is a verbatim quote from the article, almost at the end:
   
   > “These guidelines are reasonably consistent with the boundaries given
   > in Steve McConnell’s book¹³: 10-16 and 8-20. Now we can somehow
   > explain them.”
   
   There is a deep analysis on this subject, and also a large body of code
   that exists today (Apache Ozone for example) which has 80 lines limit.
   
   I believe that we are able to communicate to another human being within
   these boundaries is under discussion, and if we have a failure to
   communicate, I think the onus to prove that there exists such failure –
   that cannot be communicated in 80 chars – I humbly submit belongs to
   Nicholas and not to me.
   
   ## Comments from others.
   
   Since we are here, and I know others have asked questions slightly
   different on this topic, and even though Nicholas is special; so are
   all my other friends; I have had the privilege to work with all of you,
   and I am thankful for the wonderful time I have spent with all of you.
   So I am going to time to answer your questions also in great depth.
   
   [comment from
   sdonnel](https://github.com/apache/ozone/pull/921#issuecomment-1842489014):
   
   *I also agree, the cited article is about written text, and reading
   source code is not the same as reading natural language text. In general
   natural language is scanned and read very quickly, so long lines make
   this difficult. Code, generally gets scanned slowly as one has to take
   time to understand it. The extra line breaks and indentations can make
   it harder to follow.*
   
   This is a great comment, and I agree. I started with exact same
   presumption myself when I first reviewed this patch. There is a little
   bit of history; When I first came across this patch, I had no idea about
   any of this – but I was asked to code review on a topic, and I went
   studying to make sure that I do justice to
   
   I’d like to refer again to a study I previously mentioned, which might
   interest Steven given his passion for the topic. [Fundamentals of
   Optimal Code Style](https://optimal-codestyle.github.io/), discusses how
   natural language text is typically processed in two phases: the textual
   phase (how it is written) and the domain phase (what it means). In
   contrast, comprehending source code introduces a third dimension:
   execution.
   
   > “Natural language text is typically understood in two concurrent
   > phases: text (how it is written down) and domain (what it means).
   > Source code comprehension however needs a third dimension of
   > comprehension: execution.”
   
   This additional dimension is the focus of the earlier paper I
   referenced. It investigates whether people read code by tracing its
   execution path or by following different semantics. Specifically, the
   study examines reading patterns used to understand the [Denotational
   Semantics](https://en.wikipedia.org/wiki/Denotational_semantics) of
   code. Both the Optimal Code Style study and the eye tracking research
   delve deeply into this topic. In summary, they suggest that we tend to
   use a ‘story mode’—a linear reading pattern—rather than following the
   execution path when reading code.
   
   You might want the sections on Program Comprehension, the Top-Down
   model, and the Bottom-Up model in the [Fundamentals of Optimal Code
   Style](https://optimal-codestyle.github.io/) with an context of
   Denotational Semantics. My overarching point is that existing research
   seems to diverge from your viewpoint. As always, I am open to learning
   and would appreciate any contrary references or insights you could
   share.
   
   I will now take time to respond to [vivek’s
   question](https://github.com/apache/ozone/pull/921#issuecomment-1843283623):
   
   First of all, I want to humbly differ from vivek’s position that we
   should not look at any research papers. I think people learn
   differently, and perhaps vivek learns from his personal experiences,
   while I do learn a lot from other people (in fact this dialogue is part
   of that learning process) and reading papers is way of learning from
   other people for me. I am just going to say that I respect that way
   Vivek learns, and I hope he will be accommodating of my learning styles.
   
   Addressing your point, Vivek, the core issue here is that the shape and
   structure of code significantly affect comprehension.
   
   To specifically answer your question vivek, here is the core issue. The
   shape of objects matter in comprehension.
   
   > Studies of the relationship between ambient (global) and focal (local)
   > visual processing began in the experiments of David Navon in 1977.
   
   The auto folding of code, does NOT allow for that.
   
   The automatic folding or soft wrapping of code in IDEs doesn’t
   accommodate this aspect of visual processing. For instance, setting the
   line length to 40 characters and relying on the editor to wrap lines can
   result in inconsistent and potentially confusing experiences. To better
   understand this, You might want to look at the shape studies [Shape of
   the code 1](https://optimal-codestyle.github.io/en/cpp1_blur.jpg) and
   [Shape of the Code2](https://optimal-codestyle.github.io/en/cpp2_blur.jpg)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-3583. Loosen some rule restrictions of checkstyle [ozone]

Reply via email to