Re: [linux-usb-devel] PATCH: usb: descriptor structures have to be packed

David Brownell Tue, 06 Feb 2007 23:01:09 -0800

On Tuesday 06 February 2007 5:58 pm, Oleg Verych wrote:
> 
> Would you clarify "non aligned access issue" on well known platforms
> (*x86-*). I see on gcc's output, that they are not carry much about it:


X86 has hardware logic to handle misaligned access directly.
Which is why GCC output doesn't pay attention to that issue.

If you'll recall from the discussions of "why RISC", that was
one of the examples of something with a significant cost to
handle in hardware.   (Back a decade or two, before all high
end CPUs had multistage pipelines and out-of-order operations.)

There are a lot of CPUs that don't choose to pay that logic
cost, but with its CISC origins, x86 never had the choice.

On today's high end CPUs the cost of such hardware alignment
fixups vanishes in the noise.  But for example most ARMs don't
have such fixups ... and there are more ARMs in the world than
there are X86 processors, they're designed for low cost.  ARM
isn't the only CPU that omits such fixups, by far.


> some docs are saying about (0)"slightly faster operation", if access is
> aligned. What it's all about?

Think about it at the hardware level.  Assume your memory bus
is 32 bits wide, and you're reading a 2 byte value.

(A) If it's at address 0, 2, 4, 6 ... just read the memory,
    ignore the extra bits.  For addresses 2 or 6, right-shift
    the value by 16 bits on its way to a register.

(B) If it's at address 1 or 5, much the same; except you right
    shift by 1 byte instead.  Although, if your memory bus is
    16 bits wide (common for many embedded systems), this acts
    much the same as case (C) next...

(C) If it's at address 3 or 7, you've got TWO separate memory
    bus accesses: words 0 and 1, or 1 and 2.  Then logic must
    right-shift the first word by 3 bytes, left shift the second
    word by one byte, and combine the low bytes of the result
    words before shifting into a register.

With writing being much more complicated, what with neading to
read the old word(s) first then rewrite modified versions (or
maybe just updating in cache and letting writeback logic do
all the work neatly more transparently).

So you can see the third case is quite costly.  For that reason
and others, many CPUs don't allow cases (C) or (B)... they issue
a trap, which as I recall turns into SIGBUS in userspace, which
kills the process.  (Some systems try to fix up the memory access
in software.  Most don't; it's really slow at best, and would just
cover up a bug in the software, which shouldn't have used such an
unaligned address.  It's never polite, even when hardware allows.)

Note that I/O registers often don't support those alignment fixups
in any case...


> (1) When i started to understand things about packed/padded structs in
> gcC, my first ever question was: "WTF that padding is doing there, if i
> didn't coded it explicitly?"

Because of issues as noted above, compilers normally insert padding;
all languages allow it.  The goal of the padding is to make it so the
data aligns on a "natural" boundary so case (C) never happens, or for
similar reasons so (B) doesn't happen.  


> (2) Is it impact of hardware memory bus 
> access issues, like (0) or that bloated code from ia64 and Sparc, showed
> in proposed URL, that i even can't parse (: ?

The memory bus costs show up on all CPUs ... as lower performance,
something compilers usually avoid where possible.

But remember also that not all CPUs even support such unaligned access,
so if the compiler didn't add padding on such CPUs, your program would
die right away.  This is also a correctness issue on those systems.


> I've also read in some Intel's docs, that having bigger struct's members
> first is better, what can you say on this?

I thought the advice was "smallest first".  Regardless, the point is
to lay out your structures so you don't mix sizes ... int next to char
next to int next to char, hey that just wasted six bytes by padding!
But two chars then two ints, or two ints then two chars, wastes only
two bytes.  (The extra padding handles cases like arrays of those
structs...)


> >>   __attribute((__packed__)) is hardly superior to the old style which
> >> worked fine for a 1000 years before gcc.
> >
> > Yes, that "old style" you're promoting is indeed from the Dark Ages.
> > And you *can* make anything sound bad if you disregard its strengths,
> > as you've been doing with respect to "packed".
> 
> Do you mean byte access "magic", probably due to (2)?

I mean that using "packed" is not magic, but is simple and easy
to follow and maintain.  Where that "old style" is complex and
hard to follow, plus it's error-prone to maintain.


> In scope of (1) i can't understand: structs are padded implicitly,
> member access is coded explicitly.

But the struct declarations in question are not memory layouts.

They're "on-the-wire" protocol structures, where the compiler
has exactly ZERO business changing anything.  When the protocol
spec says the field starts at byte three, that's where it starts.
Regardless of whether, as a 16 bit field, it might trigger fewer
alignment issues if byte three held padding instead, and that
field started at byte 4 ...


> Thus i think, using "packed" attribute for things like inet-packets, wire,
> hardware is absolute *must*, because structures are usually
> copied/moved from memory to devices "as they are".

Exactly.  USB is a family of protocols, not unlike TCP/IP.  And
similar issues come up in both contexts.  Except ... that the
TCP/IP stack was designed by folk who by and large paid attention
to those alignment issues all the time, unlike USB.  (Another way
those developer communities differed is that the Internet stack
is big-endian, while USB is little-endian.)

 
> (hope you will understand all this "English" ;)

Yes.  Your care, taken with what's not your first language,
is appreciated!  :)

- Dave


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Re: [linux-usb-devel] PATCH: usb: descriptor structures have to be packed

Reply via email to