Hi Tim,

On Sat, Jan 18, 2020 at 01:32:49AM +0100, Tim Duesterhus wrote:
> Willy,
> 
> I did not touch the `struct pat_time` in pattern.h which appears to be
> completely unused. Please check whether adjustments should be made there
> as well or whether it should simply be removed. Feel free to amend my
> patch if necessary.
> 
> Best regards
> Tim Düsterhus
> 
> Apply with `git am --scissors` to automatically cut the commit message.
> 
> -- >8 --
> Subject: [PATCH] CLEANUP: Consistently `unsigned int` for bitfields
> 
> - Non-ints are not allowed by the C standard.

Hmmm that's not exactly what I'm reading in C99 #6.7.2.1 here, which
explicitly permits implementation-defined types:

   "A bit-field shall have a type that is a qualified or unqualified version of
   _Bool, signed int, unsigned int, or some other implementation-defined type."

And the implementation we rely on (gcc) says:

   
https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html
    Allowable bit-field types other than _Bool, signed int, and unsigned int
    (C99 and C11 6.7.2.1).  Other integer types, such as long int, and
    enumerated types are permitted even in strictly conforming mode.

I tend to find it extremely confusing to place "unsigned int" just after a
char, because it makes one think we're leaving a 2-byte hole before and/or
that we can use up to 32 bits without causing misalignment while in practice
we can only use the few bits left from the missing bytes. That's what we
have in fdtab:

        void *owner;
        unsigned char state;                   <-- 8-aligned
        unsigned char ev;                      <-- 8-aligned + 1
        unsigned char linger_risk:1            <-- 8-aligned + 2;
        unsigned char cloned:1;
        unsigned char initialized:1;

With the change it remains the same except that it looks quite non-obvious
as the hole will depend on the number of bits added:

        void *owner;
        unsigned char state;                   <-- 8-aligned
        unsigned char ev;                      <-- 8-aligned + 1
        unsigned int linger_risk:1             <-- 8-aligned + 2;
        unsigned int cloned:1;
        unsigned int initialized:1;

Interestingly, the change above confuses pahole:

   struct fdtab {
        long unsigned int          lock;                 /*     0     8 */
        long unsigned int          thread_mask;          /*     8     8 */
        long unsigned int          update_mask;          /*    16     8 */
        struct fdlist_entry        update;               /*    24     8 */
        void                       (*iocb)(int);         /*    32     8 */
        void *                     owner;                /*    40     8 */
        unsigned char              state;                /*    48     1 */
        unsigned char              ev;                   /*    49     1 */

        /* Bitfield combined with previous fields */

        unsigned int               linger_risk:1;        /*    48:15  4 */
        unsigned int               cloned:1;             /*    48:14  4 */
        unsigned int               initialized:1;        /*    48:13  4 */
                                                                     ^^^
Here it expects to reserve 4 bytes for the field

        /* size: 56, cachelines: 1, members: 11 */
        /* padding: 4 */
        /* bit_padding: 29 bits */
        /* last cacheline: 56 bytes */

        /* BRAIN FART ALERT! 56 != 56 + 0(holes), diff = 0 */
           ^^^^^^^^^^^^^^^^^^^^^^^^^^

And this surprizing warning (certainly a bug). Without this change, the
output looks more consistent:

        unsigned char              state;                /*    48     1 */
        unsigned char              ev;                   /*    49     1 */
        unsigned char              linger_risk:1;        /*    50: 7  1 */
        unsigned char              cloned:1;             /*    50: 6  1 */
        unsigned char              initialized:1;        /*    50: 5  1 */

More annoyingly it confuses gdb:

Before:
  (gdb) p &fdtab[0].linger_risk
  $4 = (unsigned char *) 0x32

After:
  (gdb)  p &fdtab[0].linger_risk
  $1 = (unsigned int *) 0x30

And this is not surprizing for both tools, because I think that with the
change, the unsigned int field is merged with the previous chars, which
is particularly dirty. And if we place the field before the chars, it
leaves a two-bytes hole between them.

Ideally I'd use flags there, though we'd have to place them in an unsigned
char. If you're interested in having a look at such a change I'd prefer it,
otherwise I'd prefer not to change what exists for this part because it does
exactly what we need and the change makes it wrong at least debug-wise.

> - Signed bitfields of size `1` hold the values `0` and `-1`, but are
>   usually assigned `1`, possibly leading to subtle bugs when the value
>   is explicitely compared against `1`.

Excellent point! The problem is even more subtle when checking "> 0".
And even nastier, it should fail to match against an enum because the
common type will be used for the comparison, thus the enum will be
extended and remain unsigned while the bit field will be sign-extended.

Look, that's getting funny:

    enum e {
        ZERO = 0,
        ONE  = 1,
        TWO  = 2,
    } e;

    struct f {
        int bit:2;
    } f;

    void assign(struct f *f, enum e e)
    {
        f->bit = e;
    }

    int compare(struct f *f, enum e e)
    {
        return f->bit == e;
    }

The output code does this:

0000000000000000 <assign>:
   0:   40 88 f0                mov    %sil,%al
   3:   40 8a 37                mov    (%rdi),%sil
   6:   83 e0 03                and    $0x3,%eax
   9:   83 e6 fc                and    $0xfffffffc,%esi
   c:   09 c6                   or     %eax,%esi
   e:   40 88 37                mov    %sil,(%rdi)
  11:   c3                      retq   

==> the bits are copied verbatim

0000000000000012 <compare>:
  12:   8a 07                   mov    (%rdi),%al
  14:   c1 e0 06                shl    $0x6,%eax
  17:   c0 f8 06                sar    $0x6,%al
  1a:   0f be c0                movsbl %al,%eax
  1d:   39 f0                   cmp    %esi,%eax
  1f:   0f 94 c0                sete   %al
  22:   0f b6 c0                movzbl %al,%eax
  25:   c3                      retq   

The bits are sign-extended before the result is compared, so value "TWO"
will turn to -2 and will not be matched after being set.

Actually one is at risk, the "verify" struct in ssl_bind. But fortunately
among the 3 bits it only uses 2 because the enums set there only use
values 0 to 3!

For this reason I'd like to get the signed-vs-unsigned part merged.
Did you check whether this produced any output code change ? It would
be a hint that we might possibly have latent bugs.

Thanks,
Willy

Reply via email to