On Tue, 14 Nov 2006, Joseph Koshy wrote:

bde> Non-broken code knows that byte-aligned data needs to be copied
bde> into structs using memcpy().

Do keep in mind that this is `struct ar_hdr' we are talking about, not
something else.

Having to use memcpy() to copy from a collection of ASCII strings (in file)
to a collection of ASCII strings (in struct ar_hdr) to cope with alignment
issues is odd.

I almost didn't reply because this seemed to make the change have no effect
except to break ar.h for unsupported compilers.

It turned into an argument about __packed in general and struct ip in
particular.

To be truly portable, code needs to memcpy() in each ASCII string member
of `struct ar_hdr'  separately since we cannot make assumptions about
the file and memory layout being identical.

More than that :-).  The code needs to do something magic to fill the holes,
it any.  Kernel code is supposed to prezero using memcpy or
malloc(..., M_ZERO) to avoid copying out insecurities in the holes.

Needless to say no code in our tree does this today.

So we need to look at the original intent of why a `struct ar_hdr' was
declared as a collection of fixed size ASCII strings.   My guess is that it
was written that way to be able to directly describe its file layout: ASCII
for portability and fixed size char[] arrays to avoid issues with structure
padding.   Declaring it as __packed informs today's compilers of this
intent. It also brings down the alignment requirements for `struct ar_hdr'
on the ARM to match that of other platforms as a side-effect.

The alignment requirement is actually the crtical one.  This should be
explicit (if the packing requirement is).

bde> I doubt that it is needed for more than breaking the warning.  You
bde> would have to be unlucky or shooting your feet for
bde> "(struct ar *)&byte_array[offset]" to give a pointer that is actually
bde> misaligned.  Large arrays of chars are normally aligned to the word
bde> size even if they are on the stack, and the archive header is at offset
bde> 0 in files.

In an ar(1) archive, there is a `struct ar_hdr' before each archive member.
The only 'alignment' expected for this structure is that of falling on
even offsets in the file.  This is only enforced programmatically though.

Oops, I grepped for ARMAGIC instead of ARFMAGIC, so I didn't find any file
headers.

Further testing showed:

(1) When __packed breaks the alignment as in struct ip, the compiler
doesn't report lost of alignment on taking addresses, but on an ia64
machine in the FreeBSD cluster, the broken alignment doesn't cause
problems and in fact is faster:

        struct foo {
                char    c;
                int     i;
        } __packed;

        struct foo foo;

        int
        access_int(int *ip)
        {
                return (*ip);
        }

        int
        main(void)
        {
        #ifdef BROKEN
                return (access_int(&foo.i));
        #else
                return (foo.i);
        #endif
        }

In the !BROKEN case, gcc on ia64 generates large code to load foo.i a byte
at a time.  In the BROKEN case the code just passes a misaligned pointer
to access_int().  This should be an error, since &foo.i has the alignment
of a char and the cast that changes its alignment to that of an int is
only implicit, but I can't find even a warning flag for this.  However,
on the FreeBSD cluster machine, the misaligned access doesn't trap, and
doing the access in a loop shows that the misaligned access is much faster
than the correct code that does the access a byte at a time, much the
same as on an i386.

(2) gcc still has alignment bugs even on i386's.  Even i386's things
should be aligned for efficiency, and the compiler shouldn't generate
any misaligned access for portable code, since if it does then it is
generating inefficient code which breaks detection of user misalignment
via the i486 alignment check feature.  I tried using this feature when
i486's first came out, but it didn't work because gcc generates misaligned
code for struct copies:

        struct {
                uint8_t x;
                struct {
                        uint8_t y[15];
                } ssc;
                uint32_t z;
        } x, y;

        int
        main(void)
        {
                x.ssc = y.ssc;
                return (0);
        }

Here the uint32_t and no __packed forces the struct to be 4-byte aligned,
so ssc is known to be perfectly misaligned to an address equal to 1 mod 4.
The struct copy of ssc is done inline and should consist of copying the
first byte, then a 2-byte word, then 3 4-byte words.  Instead, it is done
by copying 3 perfectly misaligned 4-byte words, then 1 misaligned 2-byte
word, then the last byte.  Thus the struct copy is inefficient and always
causes alignment check traps if the alignment check flag is set.

The code generated by gcc for this hasn't changed much since i486's first
came out.  There is one helpful change: gcc now calls memcpy() if the
struct size is larger than about 64, but when i486's came out it generated
the misaligned inline copy for much larger structs (IIRC, for any size).

gcc on i386's also generates inline copies 4 bytes at a time starting
at the first byte, for structs that might only be 1-byte aligned:

        struct {
                struct {
                        uint8_t y[60];
                } nnssc;
        } x, y;

Now the alignment cannot be known for sure, but 4-byte alignment can
usually be arranged at little cost for non-small structs, so without
alignment checking the best copying method for medium-sized structs
is to assume that they are 4-byte aligned and use 4-byte accesses.
With alignment checking, this is only best if there is a trap handler
to fix up the alignement and the alignment traps rarely occur,

Bruce
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to