On Mon, Mar 22, 2021 at 12:42:27AM +1100, Matt Dunwoodie wrote:
> On Sat, 20 Mar 2021 11:48:52 +0000
> Stuart Henderson <[email protected]> wrote:
> 
> > oh, let's cc Matt on this too.
> > 
> > On 2021/03/20 11:17, Martin Pieuchot wrote:
> > > On 19/03/21(Fri) 20:15, Stuart Henderson wrote:  
> > > > Not a great report but I don't have much more to go on, machine
> > > > had ddb.panic=0 and ddb hanged while printing the stack trace.
> > > > Retyped by hand, may contain typos. Happened a few hours after
> > > > setting up wg on it.
> > > > 
> > > > uvm_fault(0xffffffff82204e38, 0x20, 0, 1) -> e
> > > > fatal page fault in supervisor mode
> > > > trap type 6 code 0 rip ffffffff81752116 cs 8 rflags 10246 cr2 20
> > > > cpl 0 rsp 00023b35eb0 gsbase 0xffffffff820eaff0 kgsbase 0x0
> > > > panic: trap type 6, code=0, pc=ffffffff81752116
> > > > Starting stack trace...
> > > > panic(ffffffff81ddc97a) at panic+0x11d
> > > > kerntrap(ffff800023b35e00) at kerntrap+0x114
> > > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > > > wg_index_drop(ffff8000012ae000,0) at wg_index_drop+0x96
> > > > noise_create_initiation(  
> > > 
> > > This is a NULL dereference at line 1981 of net/if_wg.c:
> > > 
> > > wg_index_drop(void *_sc, uint32_t key0)
> > > {
> > >   ...
> > >   /* We expect a peer */
> > >         peer = CONTAINER_OF(iter->i_value, struct wg_peer,
> > > p_remote); ...
> > > }
> > > 
> > > Does that mean that `iter' is NULL and i_value' is at ofset 0x20 in
> > > that struct?
> > >   
> 
> Correct. The issue is we're trying to remove an index that doesn't
> exist. wg_index_drop iterates through the list and expects to find a
> matching index (perhaps a KASSERT could have been helpful here).
> Nevertheless, since index 0 doesn't exist `iter` ends up being NULL.
> 
> > Oh, I am an idiot, I had debug set and there is something other than
> > just standard messages around that time. Both sides are OpenBSD
> > wg(4). I did not have debug on the other side.
> > 
> > [...]
> > 18:51:08.041Z  wg2: Sending handshake initiation to peer 3
> > 18:51:08.091Z  wg2: Receiving handshake initiation from peer 3
> > 18:51:08.091Z  wg2: Sending handshake response to peer 3
> > 18:51:08.091Z  wg2: Unknown handshake response
> > 18:51:13.141Z  wg2: Receiving handshake initiation from peer 3
> > 18:51:13.141Z  wg2: Sending handshake response to peer 3
> > 18:51:13.191Z  wg2: Handshake for peer 3 did not complete after 5
> > seconds, retrying (try 2) 18:51:13.191Z  wg2: Receiving keepalive
> > packet from peer 3 18:51:13.191Z  wg2: Sending keepalive packe
> > 18:51:13.191Z  t to peer 3
> > 18:52:28.242Z  wg2: Sending keepalive packet to peer 3
> > 18:52:28.342Z  wg2: Receiving keepalive packet from peer 3
> > 18:53:43.343Z  wg2: Sending keepalive packet to peer 3
> > 18:54:58.345Z  wg2: Sending handshake initiation to peer 3
> > 18:54:58.395Z  wg2: Receiving handshake initiation from peer 3
> > 18:54:58.395Z  wg2: Sending handshake response to peer 3
> > 18:54:58.395Z  wg2: Unknown handshake response
> > <syslog stops here, rest retyped>
> > wg2: Handshake for peer 3 did not complete after 5 seconds, retrying
> > (try 2) wg2: Sending handshake initiation to peer 3
> > wg2: Sending handshake response to peer 3
> > <null deref here>
> 
> With this information, it was possible to reproduce the issue on my
> end. There is a race between sending/receiving handshake packets. This
> occurs if we consume an initiation, then send an initiation prior to
> replying to the consumed initiation.
> 
> In particular, when consuming an initiation, we don't generate the
> index until creating the response (which is incorrect). If we attempt
> to create an initiation between these processes, we drop any
> outstanding handshake which in this case has index 0 as set when
> consuming the initiation.
> 
> The fix attached is to generate the index when consuming the initiation
> so that any spurious initiation creation can drop a valid index. The
> patch also consolidates setting fields on the handshake.
> 
> With this patch applied, I was unable to reproduce the crash.
This looks good and works, OK kn

sthen, do you want to commit this fix?  I think it should make it into
6.9 release.

> diff --git net/wg_noise.c net/wg_noise.c
> index 86f7823cc83..176c36609fc 100644
> --- net/wg_noise.c
> +++ net/wg_noise.c
> @@ -299,9 +299,6 @@ noise_consume_initiation(struct noise_local *l, struct 
> noise_remote **rp,
>           NOISE_TIMESTAMP_LEN + NOISE_AUTHTAG_LEN, key, hs.hs_hash) != 0)
>               goto error;
>  
> -     hs.hs_state = CONSUMED_INITIATION;
> -     hs.hs_local_index = 0;
> -     hs.hs_remote_index = s_idx;
>       memcpy(hs.hs_e, ue, NOISE_PUBLIC_KEY_LEN);
>  
>       /* We have successfully computed the same results, now we ensure that
> @@ -321,6 +318,9 @@ noise_consume_initiation(struct noise_local *l, struct 
> noise_remote **rp,
>  
>       /* Ok, we're happy to accept this initiation now */
>       noise_remote_handshake_index_drop(r);
> +     hs.hs_state = CONSUMED_INITIATION;
> +     hs.hs_local_index = noise_remote_handshake_index_get(r);
> +     hs.hs_remote_index = s_idx;
>       r->r_handshake = hs;
>       *rp = r;
>       ret = 0;
> @@ -369,7 +369,6 @@ noise_create_response(struct noise_remote *r, uint32_t 
> *s_idx, uint32_t *r_idx,
>       noise_msg_encrypt(en, NULL, 0, key, hs->hs_hash);
>  
>       hs->hs_state = CREATED_RESPONSE;
> -     hs->hs_local_index = noise_remote_handshake_index_get(r);
>       *r_idx = hs->hs_remote_index;
>       *s_idx = hs->hs_local_index;
>       ret = 0;
> 

Reply via email to