On Sun, May 01, 2022 at 03:33:17PM +0100, David CARLIER wrote:
> Hi here a little patch to set idle time for SO_KEEPALIVE socket option.

Now merged, thanks.

David, one comment though, your commit messages keep missing a lot of
crucial information for reviewers and debuggers, and I had to spend time
documenting myself on keep-alive on MacOS to figure the differences and/or
what the impacts or limitations of the patch could be. I finally found and
put that information into the commit message, but it would be much easier
if you could put it yourself after your development since you actually
have access to the information I had to seek, and know the reasoning
behind your choice.

Just an indication, your patch mentioned this:

   TCP_KEEPALIVE has the same semantic.

After edition (still minimal and possibly inaccurate but the best I
could do):
   
    On Linux the interval before starting to send TCP keep-alive packets
    is defined by TCP_KEEPIDLE. MacOS has an equivalent with TCP_KEEPIDLE,
    which also uses seconds as a unit, so it's possible to simply remap the
    definition of TCP_KEEPIDLE to TCP_KEEPALIVE there and get it to seamlessly
    work. The other settings (interval and count) are not present, though.

A good rule of thumb to write a useful commit message is to think about
these 3 steps:
  - first, convince a project maintainer that your work is worth being
    merged and is very unlikely to cause trouble elsewhere ;

  - second, provide helpful information about why it's done like this and
    what it provides, to someone who would see a bisect session finish on
    this patch, so that this person doesn't simply think "what's this mess,
    let's revert it" without figuring what use case a revert could break ;

  - third, by explaining the design decisions, choices and tradeoffs (and
    sometimes even alternate solutions that were ditched), you'll help the
    person that discovers breakage find a different solution that will
    still preserve the original intent while addressing the problem.

For example recently I broke MacOS build while trying to fix a clang
warning. The two patches that caused the breakage were these ones:

  commit b12966af1006be8d4438ee1ca39c2541a1f2a4f9
  Author: Willy Tarreau <w...@1wt.eu>
  Date:   Wed Apr 13 17:09:45 2022 +0200

    BUILD: debug: mark the __start_mem_stats/__stop_mem_stats symbols as weak
    
    Building with clang and DEBUG_MEM_STATS shows the following warnings:
    
      warning: __start_mem_stats changed binding to STB_WEAK [-Wsource-mgr]
      warning: __stop_mem_stats changed binding to STB_WEAK [-Wsource-mgr]
    
    The reason is that the symbols are declared using ".globl" while they
    are also referenced as __attribute__((weak)) elsewhere. It turns out
    that a weak symbol is implicitly a global one and that the two classes
    are exclusive, thus it may confuse the linker. Better fix this.
    
    This may be backported where the patch applies.

  commit 2a06e248f5c8b7c86c7dd48eed7f6d5e87288457
  Author: Willy Tarreau <w...@1wt.eu>
  Date:   Wed Apr 13 17:12:20 2022 +0200

    BUILD: initcall: mark the __start_i_* symbols as weak, not global
    
    Just like for previous fix, these symbols are marked ".globl" during
    their declaration, but their later mention uses __attribute__((weak)),
    so it's better to only use ".weak" during the declaration so that the
    symbol's class does not change.
    
    No need to backport this unless someone reports build issues.

There's an explanation of the problem, when it's encountered, and the
reasoning behind the proposed solution. The follwing day the CI broke
on MacOS, I could restart from the elements above to try to design
another solution that still follows the same spirit (and in the worst
case it could possibly have been accepted to just revert and keep that
warning when debugging):

  commit fb1b6f5bc0e8eac116e2cafe8716a7f16d95b58e
  Author: Willy Tarreau <w...@1wt.eu>
  Date:   Thu Apr 14 16:57:12 2022 +0200

    BUILD: compiler: use a more portable set of asm(".weak") statements
    
    The two recent patches b12966af1 ("BUILD: debug: mark the
    __start_mem_stats/__stop_mem_stats symbols as weak") and 2a06e248f
    ("BUILD: initcall: mark the __start_i_* symbols as weak, not global")
    aimed at fixing a build warning and resulted in a build breakage on
    MacOS which doesn't have a ".weak" asm statement.
    
    We've already had MacOS-specific asm() statements for section names, so
    this patch continues on this trend by moving HA_GLOBL() to compiler.h
    and using ".globl" on MacOS since apparently nobody complains there.
    
    It is debatable whether to expose this only when !USE_OBSOLETE_LINKER
    or all the time, but since these are just macroes it's no big deal to
    let them be available when needed and let the caller decide on the
    build conditions.
    
    If any of the patches above is backported, this one will need to as
    well.

The new choice was explained again. Finally, two weeks later I got a
report of breakage when using external code linked at run time, the
keyword registration didn't work anymore due to a mistake in this last
patch. Fortunately, the design decision was explained and I could
restart from this, figure my mistake and make sure that the 3rd attempt
at a fix this time addressed all 3 identified use cases:

  commit 65d9f83794e00e136335348de531167f31d2f39b
  Author: Willy Tarreau <w...@1wt.eu>
  Date:   Tue Apr 26 19:35:38 2022 +0200

    BUILD: compiler: properly distinguish weak and global symbols
    
    While weak symbols were finally fixed with commit fb1b6f5bc ("BUILD:
    compiler: use a more portable set of asm(".weak") statements"), it
    was an error to think that initcall symbols were also weak. They must
    not be and they're only global. The reason is that any externally
    linked code loaded as a .so would drop its weak symbols when being
    loaded, hence its initcalls that may contain various function
    registration calls.
    
    The ambiguity came from the fact that we initially reused the initcall's
    HA_GLOBL macro for OSX then generalized it, then turned it to a choice
    between .globl and .weak based on the OS, while in fact we needed a
    macro to define weak symbols.
    
    Let's rename the macro to HA_WEAK() to make it clear it's only for weak
    symbols, and redefine HA_GLOBL() that initcall needs.
    
    This will need to be backported wherever the commit above is backported
    (at least 2.5 for now).

For sure it's not perfect, and anything can always be improved. But
that's not the point, the point here is to put info about the intent and
the approach so that these ones can be preserved or corrected later. And
that's valid for any patch, even a one-liner.

The impact of your patch is that tcp-keep-alive will start to work on
MacOS and if someone starts to rely on it and someone else later reverts
your patch because it causes a build issue on a specific platform and the
person thinks "oh we don't need this one there, we already have something
else", by just reverting it they could break the other user's deployment.

That's why it's important to put enough explanation to ensure the patch
remains firmly stuck to the code base and cannot vanish without an extremly
good argument.

Hoping this helps!

Cheers
Willy

Reply via email to