RE: Proposal: require Haddock comment for every new top-level function and type in GHC source code

Simon Peyton Jones Fri, 27 Jun 2014 03:19:28 -0700

I’d be OK with this, (it’s a bit like requiring signatures on all top level 
functions) but I don’t know how we’d enforce it.


Do you think the requirement should be for all top-level functions or just 
exported ones?

I agree that Notes have a different purpose.  But it should be OK style to 
refer to a Note from a top-level function comment, even though Haddock won’t be 
able to make much sense of it.

Simon

From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of Johan Tibell
Sent: 27 June 2014 10:51
To: ghc-devs@haskell.org
Subject: Proposal: require Haddock comment for every new top-level function and 
type in GHC source code

Hi!

I found myself exploring new parts of the GHC code base the last few weeks 
(exciting!), which again reminded me of my biggest frustration when working on 
GHC: the lack of per-function/type (Haddock) comments.

GHC code is sometimes commented with "notes", which are great but tend to (1) 
mostly cover the exceptional cases and (2) talk about the implementation of a 
function, not how a caller might use it or why.

Lack of documentation, in GHC and other software projects, usually has (at 
least) two causes:

  *   Programmers comment code they think is "complex enough to warrant a 
comment". The problem is that the author is usually a poor judge of what's 
complex enough, because he/she is too familiar with the code and tends to 
under-document code when following this principle.
  *   Documenting is boring and tends to have little benefit the person writing 
to documentation. Given lack of incentives we tend to document less than we 
ought to.
I've only seen one successful way to combat the lack of documentation that 
stems from the above: have the project's style guide mandate that top-level 
functions and types (or at least those that are exported) have documentation. 
This works well at Google.

Anecdote: we have one code base inside Google that was until recently exempt 
from this rule and documentation is almost completely absent in that code base, 
even though hundreds of engineers work on and need to understand it every day. 
This breeds institutional knowledge problems i.e. if the author of a core piece 
of code leaves, lots of knowledge is lost.

Proposal: I propose that we require that new top-level functions and types have 
Haddock comments, even if they start out as a single, humble sentence.

I've found that putting even that one sentence (1) helps new users and (2) 
establishes a place for improvements to be made. There's a strong "broken 
window" effect to lack of comments, in that lack of comments breeds more lack 
of comments as developers follow established practices.

We should add this requirement to the style guide. Having it as a written down 
policy tends to prevent having to re-hash the whole argument about 
documentation over and over again. This has also helped us a lot at Google, 
because programmers can spend endless amount of time arguing about comments, 
placement of curly braces, etc. and having a written policy helps cut down on 
that.

To give an idea of how to write good comments, here are two examples of 
undocumented code I ran into in GHC and how better comments would have helped.

First example
In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, 
which is a helper function use when computing instruction register usage.

The first question that I asked upon seeing uses of that function was "what 
does RUR stand for?" Given the context the function is in, I guessed it stands 
for read-update-read, because R is used to mean "read" in the enclosing 
function and "updating" is related to "reading" so that must be what U stands 
for. It turns out that it stands for RegUsageReadonly. Here's a comment that 
would have captured, in a single sentence, what this function is for:

    -- | Create register usage info for instruction that only
    -- reads registers.
    mkRUR src = src' `seq` RU src' []
        where src' = filter (interesting platform) src

That already a big improvement. A note about the register filtering, which 
means that not all registers you pass to the function will be recorded as being 
read in the end, could also be useful.

Aside: providing a type signature, which would have made it clear that the 
return type is RU, might also have helped in this particular case.

Second example
In the same file there a function called x86_regUsageOfInstr. It's the function 
that encloses the local function mkRUR above.

I could figure out that this function has something to do with register usage, 
of the instruction passed as an argument, and that register usage is important 
for the register allocator. However, trying to understand in more detail what 
that meant was more of challenge than it needed to be. First, a comment more 
clearly explaining what computing register usage means in practice would be 
helpful:

    -- | Returns which registers are read and written by this
    -- instruction, as a (read, written) pair. This info is used
    -- by the register allocator.
    x86_regUsageOfInstr :: Platform -> Instr -> RegUsage

The reason mentioning that the return value is essentially a (read, written) 
pair is helpful is because the body of the function a big case statement full 
of lines like this one:

    GCMP _ src1 src2 -> mkRUR [src1,src2]
    ...
    FDIV _ src  dst  -> usageRM src dst

It's not immediately clear that all the various helper functions used here just 
end up computing a pair of the above form. A top-level comment lets you 
understand what's going on without understanding exactly what all these helper 
functions are doing.

Thoughts?

-- Johan

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs

RE: Proposal: require Haddock comment for every new top-level function and type in GHC source code

Reply via email to