compiler dependencies

Christoph Egger Sat, 24 Jun 2000 14:24:50 -0700

Hi all!


I am just reading an very interesting thread on the kernel-ML:

------------------------------------------------------------------
> > Are you saying that:
> > struct foo
> > {
> >         int x;
> >         int y;
> > } bar;
> >
> > ((int *)&bar + 1) != &bar.y
> >
> > can sometimes be true?
>
> I suspect this will always be true, at least unless x is not of type char...
>
> can you point to a place in K&R or ANSI C that says so?
>
> Currently, I have no reference (but common sense), but AFAIK "+1" means
> "increment the pointer by 1 (byte)", not "increment the pointer by one
> storage unit (4 bytes)".
>
> However, I suspect something like "&(((int *)&bar)[1]) != &bar.y" or
> "((int *)&bar + 4) != &bar.y" was your question.

incorrect. what do you think this trivial program prints:

struct foo {
        int x;
        int y;
} bar;

int main(int argc, char *argv[])
{
        struct foo *ptr;

        ptr = &bar;
        printf("ptr=%x\n", ptr);
        ptr = ptr + 1;
        printf("ptr=%x\n", ptr);
        return 0;
}

$ ./ptr
ptr=8049560
ptr=8049568

so, +1 means "increment the pointer by 1 storage unit (8 bytes in this
case)". However, pointer arithmetics is not relevant to how different
members of a given structure are packed.

> Shame on me! In the past 3 years, I have been writing only about 200 lines
> of C code. I'm spoiled by Java... (and years before, by the E language for
> Amiga computers, where +1 really was "plus one byte" IIRC)

There is a lot of misinformation floating around here.  I feel
compelled to speak up.

#1: A compiler may insert padding between members of a struct or
after the last member.  It may not insert padding before the
first member.  Thus, the address of a struct member, plus one, is
not necessarily the address of the next member.

Citation: ANSI C89, section 6.5.2.1 "Structure and union
specifiers", as follows:

        Within a structure object, the non-bit-field members and
        units in which bit-fields reside have addresses that
        increase in the order in which they are declared.  A
        pointer to a structure object, suitably converted, points
        to its initial member... and vice versa.  There may
        therefore be unnamed padding within a structure object,
        but not at its beginning, as necessary to achieve the
        appropriate alignment.

#2: Pointer arithmetic takes into account the size of the object
pointed to.  That is, ptr + 1 points to the same address as
((char *) ptr) + sizeof *ptr.

Citation: ANSI C89, section 6.3.6 "Additive operators", as
follows:

        When an expression that has integral type is added to or
        subtracted from a pointer, the result has the type of the
        pointer operand.  If the pointer operand points to an
        element of an array object, and the array is large
        enough, the result points to an element offset from the
        original element such that the difference of the
        subscripts of the resulting and original array elements
        equals the integral expression.


K&R says (page 213, A8.3):

   Adjacent field members of structures are packed into
   implementation-dependent storage units in an implementation-dependent
   direction. ... The members of a structure have addresses increasing in
   the order of their declaration.

There is no guarantee that field members of a structure can be found one after
another starting from the first field. But the first field can always be found
from the address of the structure itself, as K&R says:

   If a pointer to a structure is cast to the type of a pointer to its
   first member, the result refers to the first member.


> > So all drivers (I'm sure there are a few) that use something like
> >
> > struct foo {
> >     u32     a;
> >     u32     b;
> >     u32     c;
> >     u32     d;
> > }
> >
> > to communicate with some hardware (4 32-bit values with addresses in
> > sequence) should be fixed not to make assumptions about the layout of a
> > struct?
>
> They should be fixed to use __attribute__((packed)).  Also they shouldn't
> have any unaligned struct members.

are you 100% sure its needed?

Linux kernel is not written in ANSI-C.  It makes certain assumptions about
the environment which are not guaranteed by the standard. One of them
is that void * fits into unsigned long. Another is that no structure elements
get have a bigger alignment than their size (so u32 gets at worst 3 bytes
alignment and u8/u16 can be used to pad that explicitely ). If a machine
cannot satisfy that maybe it should look for a different kernel.

De facto, for most if not all Linux-supported architectures, if you put
larger members before smaller members, gcc won't add padding; also, if you
do something like

    u32 a;
    u16 b1;
    u16 b2;
    u32 c;

no padding will be added because b1 and b2 leave the structure aligned
properly.

> > IIRC there are dozens of structs shared between userspace and kernel
> > space. Does that mean that all those structs are strictly speaking
> > incompatible with each other, given that the compiler or CFLAGS are
> > different?
>
> Yes, the C standard gives a lot of flexibility to the compiler here.
> I *think* it mandates that the fields in a struct are stored int the
> same order that they are declared, but I'm not even 100% sure about this.

Yes, it does.

> On UNIX systems, you can still use any C compiler that you like and link
> with code compiled with another compiler, because the ABI (application
> binary interface) standards makes some additional requirements. For example
> on Linux systems we use the system V ABI which defines the precise layout
> of the structs in a processor-specific addendum. That way we know that every
> compliant compiler will use the same layout on a given system. We have no
> guarantee that the layout will be the same on a different processor though.

Processor or architecture? If I compile my utils for 486, I want them to run
under 586...

If in-structure padding is allowed and only subject to the implementation, what
happens if the compiler does padding for a included struct for A.c, but not for
B.c and then both object files are linked together? Or is there any requirement
for compilers to take only the bare data structure (and not things like assumed
access patterns) into consideration to decide wether to do padding or not?
Although I assume only the former, the latter would not be impossible with
implementation-specific padding. (Or maybe I'm wrong again...)


-------------------------------------------------------------------

So, we should recheck all structs of _all_ GGI-related libs to avoid some
suprisings on different architectures/OS/compilers ...


Christoph Egger
E-Mail: [EMAIL PROTECTED]
architecture/OS/compiler dependencies

Reply via email to