On Tue, Sep 14, 2021 at 07:31:33PM +0200, Peter J. Philipp wrote:
> On Tue, Sep 14, 2021 at 10:48:44AM -0600, Theo de Raadt wrote:
> > Mark Kettenis <[email protected]> wrote:
> >
> > > To be honest, I do think that adding __packed is a reasonable way to
> > > handle protocol structs like this where performance doesn't really
> > > matter. This translates into __attribute__((packed)) and both GCC and
> > > LLVM started treating that in a way to signal that the data might not
> > > be properly aligned and use byte access on architectures that need
> > > strict alignment. This is still not explicitly documented but I don't
> > > think the compiler writers can backtrack on that at this point.
> >
> > __packed does not mean "use bytes". It means ensure there are no
> > padding bytes, and then on strict alignment systems when something is
> > misaligned, the compiler can see it must use smaller loads.
> >
> > Unfortunately, tcpdump also parses encapsulated protocols, and some of
> > the outer layers are limited to short-alignment. So if an inner layer
> > has a 32-bit value inside the __packed array after packing, the compiler
> > will believe it is still on a 32-bit boundary, and not use char or short
> > accesses. This will fault, because tcpdump is passing a pointe to an
> > object with tighter alignment.
> >
> > __packed is not saying "each object must be loaded as bytes", and I
> > really am susprised you claim that is what happens today. That is a
> > crazy expensive choice on some architectures.
> >
> > So I think __packed is not enough for what tcpdump is doing.
> >
> > Perhaps some of the compilers are being more cynical, or their costing
> > of loads leads them to do smaller-storage operations... but I have a
> > recollection that at least gcc on the alpha gets confused and does the
> > wrong thing.
>
> To appease the situation, I have 1. taken what Visa said about the nonce,
> 2. taken my old patch and changed the memcpy's to EXTRACT_64BITS() which
> I had to homeroll off EXTRACT_32BITS(), and made a patch and tested it.
>
> No bus errors again, though I don't know if it's the right approach. The
> nonces in the tcpdump were sequential counting up from 1 as my wireguard
> hardware was rebooting. I think I control-c'ed by 218 nonce or so.
EXTRACT_64BITS() followed by letoh64() is not correct because the former
converts byte order from big endian to host. It is better to add
EXTRACT_LE_64BITS().
The patch assumes that the data already are 4-byte aligned (bpf(4) and
libpcap ensure proper alignment for the network header; also, print-ip.c
and print-ip6.c do last-resort fixing when needed).
Index: extract.h
===================================================================
RCS file: src/usr.sbin/tcpdump/extract.h,v
retrieving revision 1.9
diff -u -p -r1.9 extract.h
--- extract.h 7 Oct 2007 16:41:05 -0000 1.9
+++ extract.h 15 Sep 2021 12:40:56 -0000
@@ -51,3 +51,12 @@
(u_int32_t)*((const u_int8_t *)(p) + 2) << 16 | \
(u_int32_t)*((const u_int8_t *)(p) + 1) << 8 | \
(u_int32_t)*((const u_int8_t *)(p) + 0))
+#define EXTRACT_LE_64BITS(p) \
+ ((u_int64_t)*((const u_int8_t *)(p) + 7) << 56 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 6) << 48 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 5) << 40 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 4) << 32 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 3) << 24 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 2) << 16 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 1) << 8 | \
+ (u_int64_t)*((const u_int8_t *)(p) + 0))
Index: print-wg.c
===================================================================
RCS file: src/usr.sbin/tcpdump/print-wg.c,v
retrieving revision 1.6
diff -u -p -r1.6 print-wg.c
--- print-wg.c 14 Apr 2021 19:34:56 -0000 1.6
+++ print-wg.c 15 Sep 2021 12:40:56 -0000
@@ -142,8 +142,9 @@ wg_print(const u_char *bp, u_int length)
printf("[wg] keepalive ");
if (caplen < offsetof(struct wg_data, mac))
goto trunc;
+ /* data->nonce may be unaligned. */
printf("to 0x%08x nonce %llu",
- letoh32(data->receiver), letoh64(data->nonce));
+ letoh32(data->receiver), EXTRACT_LE_64BITS(&data->nonce));
break;
}
return;