On Mon, Mar 22, 2021 at 12:42:27AM +1100, Matt Dunwoodie wrote: > On Sat, 20 Mar 2021 11:48:52 +0000 > Stuart Henderson <[email protected]> wrote: > > > oh, let's cc Matt on this too. > > > > On 2021/03/20 11:17, Martin Pieuchot wrote: > > > On 19/03/21(Fri) 20:15, Stuart Henderson wrote: > > > > Not a great report but I don't have much more to go on, machine > > > > had ddb.panic=0 and ddb hanged while printing the stack trace. > > > > Retyped by hand, may contain typos. Happened a few hours after > > > > setting up wg on it. > > > > > > > > uvm_fault(0xffffffff82204e38, 0x20, 0, 1) -> e > > > > fatal page fault in supervisor mode > > > > trap type 6 code 0 rip ffffffff81752116 cs 8 rflags 10246 cr2 20 > > > > cpl 0 rsp 00023b35eb0 gsbase 0xffffffff820eaff0 kgsbase 0x0 > > > > panic: trap type 6, code=0, pc=ffffffff81752116 > > > > Starting stack trace... > > > > panic(ffffffff81ddc97a) at panic+0x11d > > > > kerntrap(ffff800023b35e00) at kerntrap+0x114 > > > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > > > > wg_index_drop(ffff8000012ae000,0) at wg_index_drop+0x96 > > > > noise_create_initiation( > > > > > > This is a NULL dereference at line 1981 of net/if_wg.c: > > > > > > wg_index_drop(void *_sc, uint32_t key0) > > > { > > > ... > > > /* We expect a peer */ > > > peer = CONTAINER_OF(iter->i_value, struct wg_peer, > > > p_remote); ... > > > } > > > > > > Does that mean that `iter' is NULL and i_value' is at ofset 0x20 in > > > that struct? > > > > > Correct. The issue is we're trying to remove an index that doesn't > exist. wg_index_drop iterates through the list and expects to find a > matching index (perhaps a KASSERT could have been helpful here). > Nevertheless, since index 0 doesn't exist `iter` ends up being NULL. > > > Oh, I am an idiot, I had debug set and there is something other than > > just standard messages around that time. Both sides are OpenBSD > > wg(4). I did not have debug on the other side. > > > > [...] > > 18:51:08.041Z wg2: Sending handshake initiation to peer 3 > > 18:51:08.091Z wg2: Receiving handshake initiation from peer 3 > > 18:51:08.091Z wg2: Sending handshake response to peer 3 > > 18:51:08.091Z wg2: Unknown handshake response > > 18:51:13.141Z wg2: Receiving handshake initiation from peer 3 > > 18:51:13.141Z wg2: Sending handshake response to peer 3 > > 18:51:13.191Z wg2: Handshake for peer 3 did not complete after 5 > > seconds, retrying (try 2) 18:51:13.191Z wg2: Receiving keepalive > > packet from peer 3 18:51:13.191Z wg2: Sending keepalive packe > > 18:51:13.191Z t to peer 3 > > 18:52:28.242Z wg2: Sending keepalive packet to peer 3 > > 18:52:28.342Z wg2: Receiving keepalive packet from peer 3 > > 18:53:43.343Z wg2: Sending keepalive packet to peer 3 > > 18:54:58.345Z wg2: Sending handshake initiation to peer 3 > > 18:54:58.395Z wg2: Receiving handshake initiation from peer 3 > > 18:54:58.395Z wg2: Sending handshake response to peer 3 > > 18:54:58.395Z wg2: Unknown handshake response > > <syslog stops here, rest retyped> > > wg2: Handshake for peer 3 did not complete after 5 seconds, retrying > > (try 2) wg2: Sending handshake initiation to peer 3 > > wg2: Sending handshake response to peer 3 > > <null deref here> > > With this information, it was possible to reproduce the issue on my > end. There is a race between sending/receiving handshake packets. This > occurs if we consume an initiation, then send an initiation prior to > replying to the consumed initiation. > > In particular, when consuming an initiation, we don't generate the > index until creating the response (which is incorrect). If we attempt > to create an initiation between these processes, we drop any > outstanding handshake which in this case has index 0 as set when > consuming the initiation. > > The fix attached is to generate the index when consuming the initiation > so that any spurious initiation creation can drop a valid index. The > patch also consolidates setting fields on the handshake. > > With this patch applied, I was unable to reproduce the crash. This looks good and works, OK kn
sthen, do you want to commit this fix? I think it should make it into 6.9 release. > diff --git net/wg_noise.c net/wg_noise.c > index 86f7823cc83..176c36609fc 100644 > --- net/wg_noise.c > +++ net/wg_noise.c > @@ -299,9 +299,6 @@ noise_consume_initiation(struct noise_local *l, struct > noise_remote **rp, > NOISE_TIMESTAMP_LEN + NOISE_AUTHTAG_LEN, key, hs.hs_hash) != 0) > goto error; > > - hs.hs_state = CONSUMED_INITIATION; > - hs.hs_local_index = 0; > - hs.hs_remote_index = s_idx; > memcpy(hs.hs_e, ue, NOISE_PUBLIC_KEY_LEN); > > /* We have successfully computed the same results, now we ensure that > @@ -321,6 +318,9 @@ noise_consume_initiation(struct noise_local *l, struct > noise_remote **rp, > > /* Ok, we're happy to accept this initiation now */ > noise_remote_handshake_index_drop(r); > + hs.hs_state = CONSUMED_INITIATION; > + hs.hs_local_index = noise_remote_handshake_index_get(r); > + hs.hs_remote_index = s_idx; > r->r_handshake = hs; > *rp = r; > ret = 0; > @@ -369,7 +369,6 @@ noise_create_response(struct noise_remote *r, uint32_t > *s_idx, uint32_t *r_idx, > noise_msg_encrypt(en, NULL, 0, key, hs->hs_hash); > > hs->hs_state = CREATED_RESPONSE; > - hs->hs_local_index = noise_remote_handshake_index_get(r); > *r_idx = hs->hs_remote_index; > *s_idx = hs->hs_local_index; > ret = 0; >
