Thanks, Johan, for starting this discussion.

I mostly agree with the proposal. However, one (at times, serious) drawback to 
using Haddock is that it means that editing comments can cause parse failures. 
The way the GHC build works, these failures may not be detected until the end 
of a hacking session (if I'm using, say, `make 2`, as I tend to do) and then 
can be hard to diagnose. I've actually been bitten by this when working on GHC.

So, I have to ask: why use Haddock? Do folks read the Haddock docs for GHC? (I 
don't, but perhaps that's because the docs aren't so good right now.) Would it 
be acceptable to change this proposal not to require Haddock docs?

Even if we decide to keep this proposal about Haddock docs specifically, I 
would strongly request that correct rendering of the Haddock docs not be 
scrutinized. At the end of a hacking session (which is hard enough to find time 
for, as is), I don't want to be expected to look through the generated HTML to 
make sure that my typewriter font and italics are rendering correctly. This is 
something of a corollary to Simon's comment about wanting to refer to Notes 
from Haddock comments -- I would want the Haddock output to be quite secondary 
to the proper documentation in the source code.

(Note that this "demotion" of the role of Haddock is certainly not my practice 
in released libraries! But, Haddock is much less useful in an application like 
GHC than in a library.)

All that said, I do agree with the intent of this proposal and am happy to take 
on my part of the burden of documenting new (and perhaps some old) functions as 
I work. I have been very guilty of the "broken window" effect in not 
documenting new code.

Thanks,
Richard

On Jun 27, 2014, at 5:51 AM, Johan Tibell <johan.tib...@gmail.com> wrote:

> Hi!
> 
> I found myself exploring new parts of the GHC code base the last few weeks 
> (exciting!), which again reminded me of my biggest frustration when working 
> on GHC: the lack of per-function/type (Haddock) comments.
> 
> GHC code is sometimes commented with "notes", which are great but tend to (1) 
> mostly cover the exceptional cases and (2) talk about the implementation of a 
> function, not how a caller might use it or why.
> 
> Lack of documentation, in GHC and other software projects, usually has (at 
> least) two causes:
> Programmers comment code they think is "complex enough to warrant a comment". 
> The problem is that the author is usually a poor judge of what's complex 
> enough, because he/she is too familiar with the code and tends to 
> under-document code when following this principle.
> Documenting is boring and tends to have little benefit the person writing to 
> documentation. Given lack of incentives we tend to document less than we 
> ought to.
> I've only seen one successful way to combat the lack of documentation that 
> stems from the above: have the project's style guide mandate that top-level 
> functions and types (or at least those that are exported) have documentation. 
> This works well at Google.
> 
> Anecdote: we have one code base inside Google that was until recently exempt 
> from this rule and documentation is almost completely absent in that code 
> base, even though hundreds of engineers work on and need to understand it 
> every day. This breeds institutional knowledge problems i.e. if the author of 
> a core piece of code leaves, lots of knowledge is lost.
> 
> Proposal: I propose that we require that new top-level functions and types 
> have Haddock comments, even if they start out as a single, humble sentence.
> 
> I've found that putting even that one sentence (1) helps new users and (2) 
> establishes a place for improvements to be made. There's a strong "broken 
> window" effect to lack of comments, in that lack of comments breeds more lack 
> of comments as developers follow established practices.
> 
> We should add this requirement to the style guide. Having it as a written 
> down policy tends to prevent having to re-hash the whole argument about 
> documentation over and over again. This has also helped us a lot at Google, 
> because programmers can spend endless amount of time arguing about comments, 
> placement of curly braces, etc. and having a written policy helps cut down on 
> that.
> 
> To give an idea of how to write good comments, here are two examples of 
> undocumented code I ran into in GHC and how better comments would have helped.
> 
> First example
> In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR, 
> which is a helper function use when computing instruction register usage.
> 
> The first question that I asked upon seeing uses of that function was "what 
> does RUR stand for?" Given the context the function is in, I guessed it 
> stands for read-update-read, because R is used to mean "read" in the 
> enclosing function and "updating" is related to "reading" so that must be 
> what U stands for. It turns out that it stands for RegUsageReadonly. Here's a 
> comment that would have captured, in a single sentence, what this function is 
> for:
> 
>     -- | Create register usage info for instruction that only
>     -- reads registers.
>     mkRUR src = src' `seq` RU src' []
>         where src' = filter (interesting platform) src
> 
> That already a big improvement. A note about the register filtering, which 
> means that not all registers you pass to the function will be recorded as 
> being read in the end, could also be useful.
> 
> Aside: providing a type signature, which would have made it clear that the 
> return type is RU, might also have helped in this particular case.
> 
> Second example
> In the same file there a function called x86_regUsageOfInstr. It's the 
> function that encloses the local function mkRUR above.
> 
> I could figure out that this function has something to do with register 
> usage, of the instruction passed as an argument, and that register usage is 
> important for the register allocator. However, trying to understand in more 
> detail what that meant was more of challenge than it needed to be. First, a 
> comment more clearly explaining what computing register usage means in 
> practice would be helpful:
> 
>     -- | Returns which registers are read and written by this 
>     -- instruction, as a (read, written) pair. This info is used
>     -- by the register allocator.
>     x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
> 
> The reason mentioning that the return value is essentially a (read, written) 
> pair is helpful is because the body of the function a big case statement full 
> of lines like this one:
> 
>     GCMP _ src1 src2 -> mkRUR [src1,src2]
>     ...
>     FDIV _ src  dst  -> usageRM src dst
> 
> It's not immediately clear that all the various helper functions used here 
> just end up computing a pair of the above form. A top-level comment lets you 
> understand what's going on without understanding exactly what all these 
> helper functions are doing.
> 
> Thoughts?
> 
> -- Johan
> 
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs

Reply via email to