Starke, Daniel <[email protected]> 于2026年6月17日周三 15:26写道:
>
> > The receive worker walks gsm->dlci[] without gsm->mutex while a
> > concurrent GSMIOC_SETCONF -> gsm_cleanup_mux() frees the DLCIs, so the
> > control handlers can dereference a freed gsm_dlci. v1's NULL check only
> > narrowed the window; v2 fixes the use-after-free itself.
> >
> > The fix pins each DLCI the dispatch dereferences with its existing
> > tty_port reference (option 2), so the data path stays lock-free. See the
> > patch 1 commit message for details, including why the late destructor
> > uses cmpxchg() so it cannot wipe a re-created mux (Daniel's teardown
> > concern).
> >
> > Changes since v1:
> > - Fix the UAF by reference-pinning instead of a NULL check in the
> > handlers; no gsm->mutex in the data path (Greg, Daniel).
> > - Pin every DLCI the dispatch touches, not just the addressed one:
> > MSC/RLS/PN operate on gsm->dlci[k] named in the payload.
> > - Add a base selftest (patch 2), as Greg asked.
> >
> > Verification (KASAN, panic_on_warn=1): the originally reported splat is
> > the gsm_control_reply() / CMD_TEST path (see the Link in patch 1). A
> > reproducer targeting the MSC handler crashes the unpatched kernel and
> > survives 270 race rounds on v2. The selftest passes on both the clean
> > and patched kernel (pass:3 fail:0 skip:0).
> >
> > Weiming Shi (2):
> > tty: n_gsm: fix use-after-free in gsm_queue() control frame dispatch
> > selftests: tty: add base regression test for n_gsm line discipline
>
> So now there is a race on device node level. We have the old virtual
> gsmttys still waiting for the current pending operations to finish while
> the reconfiguration of the ldisc triggers a recreation of these. How is
> this handled?
>
A pending gsmtty operation stays bound to the old object. gsmtty_install()
takes a dlci_get() and the gsmtty ops use tty->driver_data, not gsm->dlci[],
so an open fd keeps its own reference and never reaches the recreated dlci
(a fresh gsm_dlci_alloc()). On teardown gsm_dlci_release() does
tty_vhangup(), which unblocks the pending ops, and sets DLCI_CLOSED, which
every gsmtty op checks first.
This is the existing tty_port refcount, and the receive path here uses the
same dlci_get/put, so I don't think the pin adds a device-node race. But
you know this hardware better than I do; if there's a path I'm missing,
point me at it.
> PATCH 1/2:
>
> > - if (addr == 0 || addr >= NUM_DLCI || gsm->dlci[addr] == NULL)
> > + if (addr == 0)
> > return;
> > - dlci = gsm->dlci[addr];
>
> Control octets can be transmitted on non-control DLCIs in convergence layer
> type 2. You do not guard against invalid DLCIs anymore. Same for parameter
> negotiation and RLS.
>
The checks aren't dropped, just folded into gsm_dlci_open_get(), which
returns NULL for addr >= NUM_DLCI and for an empty slot. But hiding them
in the lookup makes the diff read like the guard is gone, which was a bad
call. For v4, maybe we should keep the addr checks at the call sites and
et the helper only do the locked lookup and pin, so the guard stays
visible where the DLCI is used? Does that match what you'd expect?
> > Fixes: e1eaea46bb40 ("tty: n_gsm line discipline")
> > Cc: [email protected]
> > Reported-by: Xiang Mei <[email protected]>
> > Link: https://lore.kernel.org/all/[email protected]/
> > Signed-off-by: Weiming Shi <[email protected]>
>
> This whole worker separation was added later, not in e1eaea46bb40. Maybe
> worth updating to the corresponding commit.
I'll find the commit that moved the eceive side onto the worker and
point Fixes: at that.
> Also, the code comments have a strong AI flavor but AI use was only
> disclosed for PATCH 2/2.
Sorry, I missed that tag. v4 will add
Thanks for the careful review.
>
> Best regards,
> Daniel Starke