Hi Michael, Thanks for helpful reply. Unfortunately, as I receive only the user digest mail and you did not include me in the To or Cc lists, I didn't receive this mail in my inbox, so I did not notice it until much later when checking one of the mirrors.
Is it considered "wrong" to Reply-All around here? On other lists of this style I use, we encourage Reply-All to keep folks who participate in the conversation receiving email. I want to learn the norms of this community, though. > On Wednesday, 15 October 2025 02:45:32 British Summer Time D. Ben Knoble > wrote: > > I've tried to describe the symptoms on my blog as best I can [1]. > > Here's a repaste of the relevant bits for posterity. > > > > [1]: > > https://benknoble.github.io/blog/2025/10/13/framework-gentoo-network-hang/ > > > > ---8<--- > > I've recently installed Gentoo on my new Framework Desktop, and for a while > > the network would hang, causing any program that touched the hardware to > > also crap out. **Update same day:** I've just had the problem recur! Agh! > > > > Here are some symptoms: > > - Thinks chug along just fine, until > > - The network goes down and commands start freezing. We're talking no C-c, > > no C-\\, no `pkill -9`: just dead in the water. (I hadn't solved switching > > consoles with Alt-Function keys yet, so fortunately I had tmux running.) - > > Notably, commands like `emerge` hang when they get to network steps. > > Various `ip` & `ifconfig` style commands hang immediately. > > - Commands like `nmcli` and `ping` don't hang, but don't work. > > - I'm pretty sure `/etc/resolv.conf` isn't being touched. > > - The `NetworkManager` daemon is running, but: > > - `top` shows "D" (d-sleep) and `ps` shows "Dsl" > > - `rc-status` reports NetworkManager as active, but `nmcli g` says its > > down - `rc-service NetworkManager restart` fails with > > > > ``` > > * Caching service dependencies ... [ ok ] > > * Unmounting network filesystems ... [ ok ] > > * Stoppping NetworkManager ... > > * start-stop-daemon: 1 rocesses refused to stop [ !! ] > > * ERROR: NetworkManager failed to stop > > * Mounting network filesystems ... [ ok ] > > ``` > > Do you have some duplicated AP connection and nm will not stop? > > Your log shows successive de/re-authentication with different APs, e.g. first > with 70:58:a4:d2:14:21 and then within seconds: > > disconnect from AP 70:58:a4:d2:14:21 for new auth to 70:58:a4:d2:14:29 Right, I noticed this too! I have a main router downstairs and an extender upstairs in the same room as the device in question. As I explained over on my blog: I saved some logs with { echo count from to; grep 'disconnect from AP' /var/log/messages | grep -o '\([[:xdigit:]][[:xdigit:]]:\?\)\{6\}' | paste - - | sort | uniq -c; } | column -t which gave for example count from to 3 <prefix>:2f:a9:51 <prefix>:2f:a9:59 22 <prefix>:2f:a9:51 <prefix>:d2:14:29 1 <prefix>:d2:14:21 <prefix>:2f:a9:59 2 <prefix>:d2:14:21 <prefix>:d2:14:29 As far as I can tell, the (masked) <prefix> there matches the output from another connected device when checking the router with arp -a: gr6exx0c-a940.lan (192.168.2.1) at <prefix>:2f:a9:40 on en0 ifscope [ethernet] And my (Google Fiber) management app says - the router has a MAC of <prefix>:2f:a9:41 - the extender (MoCA) has a MAC of <prefix>:d2:14:10 So clearly something is going on here, and the NetworkManager is having trouble deciding which connection to use? But I'm not sure what the deal is with all those MACs that end in different digits. My other devices don't seem to have a problem, but I haven't looked (and am not sure how to look) to see if there are repeatedly re-authenticating like this. As I'll mention below, but so as not to bury the lede: iwd is now installed and configured (with wpa_supplicant gone), but I'm still seeing this behavior. It might now even be more prevalent (occurring every minute?). Thoughts? > > - similarly with `s/restart/stop`, and `pkill -9 NetworkManager` has no > > effect! > > Run ps axf to see how many instances of networkmanager are present and 'nmcli > con show', but first see below. With htop, I could see 4 instances of something that looked like NM, but (using "m" to merge) 2 of them were tagged with "gmane" and one with "dbus." None were killable at the time. > > - I didn't understand how `zap` worked at the time, but it turns out > > not to help > > - shutting down complains about not stopping the NetworkManager, too > > - Turns out, `wpa_supplicant` is also in "D" status! And owned by init. > > Hm. - `grep -R wpa /var/logs` reminded me I needed to create > > `/etc/wpa_supplicant/wpa_supplicant.conf`. (I've done so since.) > > Take a look at this page and consider all suggestions. In particular, > *wpa_supplicant is no longer recommended* and iwd should be used instead: > > https://wiki.gentoo.org/wiki/NetworkManager > > https://wiki.gentoo.org/wiki/Iwd Thank you very much for this! Nothing in the Handbook or the NetworkManager wiki (aside from one specific error I wasn't encountering) really suggests that iwd should be used, so it was easy to overlook. Thoughts on fixing that? > > I checked my hardware with `lspci`, which I'd fortunately installed during > > the main system installation process, and with `lspci -k` I found out my > > network card is > > > > ``` > > c0:00.0 Mediatek MT7925 (RZ717) Wi-Fi 7 160MHz [14c3:0717] > > ``` > > > > with kernel module (in use) `mt7925e`. > > > > First, I tried turning off WiFi power saving [as recommended by Garuda > > folks](https://forum.garudalinux.org/t/mediatek-mt7925e-wifi-speed-very-slow > > -on-close-to-fresh-install-and-some-updates/41845/11): > [snip ...] > > > Here's a [capture between a reboot and when I saw the problem > > again](https://paste.gentoo.zip/gIeCuVQ4); I can also see `chronyd` fail > > (probably because the network is down) and some things about DHCP lease > > expiry or failures. At the end of that log dump, I `doas reboot`, and then > > have to hold the power button (see below) to truly power off and reboot. > > It is much safer to try pressing keys R E I S U B ("busier" backwards) while > holding down Alt + SysRq (or Print Screen button if SysRq is not available): > > https://en.wikipedia.org/wiki/Magic_SysRq_key Thanks, that's good to know. My SysRq key seems to not be doing anything right now (or maybe it's pretending to be compose?), so I'll have to figure that out… Stock Gentoo with a kbd of "us", although it is really a programmable keyboard. (The key is supposed to send SysRq though.) > > Relevant information: > > - [`/etc/conf.d/netmount`](https://bpa.st/5QMN6): I only found out about > > this file from `view /etc/conf.d/net*` while reading [the Handbook's > > troubleshooting > > You shouldn't mix 'n match uncaringly different methods for managing your > wireless network connectivity. You can end up with one system fighting for > control against the other. Indeed, I found that out the hard way early on. But AFAICT, I'm _not_ mixing. Unless I've misread something? > On Wednesday, 15 October 2025 19:07:22 British Summer Time D. Ben Knoble > wrote: > > On Tue, Oct 14, 2025 at 9:45=E2=80=AFPM D. Ben Knoble > > <[email protected]> wrote: > > > I've tried to describe the symptoms on my blog as best I can [1]. > > > Here's a repaste of the relevant bits for posterity. > > > [1]: > > > https://benknoble.github.io/blog/2025/10/13/framework-gentoo-network-hang > > > / > > I realized this morning, with a clearer head, that it might not be > > clear if I'm still having trouble. > > I am—the system was on overnight, and I can clearly see "zombie" > > processes for NetworkManager and wpa_supplicant (states Dsl and D > > respectively; when the system comes up and is working, that's Ssl and > > S). > > What other information might help? > > /etc/wpa_supplicant/wpa_supplicant.conf: > > ctrl_interface=3D/var/run/wpa_supplicant > > eapol_version=3D1 > > ap_scan=3D1 > > fast_reauth=3D1 > > (Curiously, that /var/run/wpa_supplicant path does not exist.) > > equery shows netifrc is installed (and /etc/init.d/net.lo exists), but > > "pgrep net." doesn't find any related programs running? > > Restarting wpa_supplicant via rc-service _also_ exhibits the "hang" > > behavior after > > * Starting WPA Supplicant Daemon ... > > Successfully initialized wpa_supplicant > > I'm tempted to try "iwd" instead at this point, but IDK if that will > > make a difference here. > > I do have dhcpcd installed because openrc's USE flags include netifrc, > > but neither is running. > > Since this is a desktop and you will not be switching between APs all day, > you could do away with nm, keep wpa_supplicant and configure the BSSID (MAC > address) of your AP with the highest signal strength. Oh, that's also an option. I've considered just running the one ethernet cable, too… > Or, since you have nm installed and running, remove wpa_supplicant and install > and configure iwd as recommended. > > Have a look here to see the different wireless software options available to > you: > > https://wiki.gentoo.org/wiki/Network_management iwd is now installed and running. Let's see if I hit the same issue as before…

