Strange network issues with -current

Alexander Leidinger Tue, 15 Aug 2023 04:49:16 -0700

Hi,

since a while I have some strange network issues in some parts of aparticular system.

A build with src from 2023-07-26 was still working ok. An update to2023-08-07 broke some parts in a strange way. I tried again with srcfrom 2023-08-11 didn't fix things.


What I see is... strange and complex.

I have a jail host with about 23 jails. All the jails are sitting on abridge, and have IPv6 and IPV4 addresses. One jail is a DNS server for adomain which contains all the DNS entries for all the jails on thesystem (and more). Other jails have mysql (FS socket for mysqlnullfs-mounted into other jails for connecting to mysql via the FSsocket instead of the network), dovecot IMAP server, postfix SMTPserver, a nginx based reverse proxy and 2 different kinds of webmailsolutions (old php74 based on the way out on favour for a php81 basedone), a wiki and other things.

With the old working basesystem I can login into the old webmail systemand read mails. With the newer non-working basesystem I still can login,but the auth-credentials are not stored in the backend-session and assuch no mail is listed at all, as this requires subsequent connectionsfrom php to dovecot. This webmail system is going via the reverse proxyto the webmail-jail which has another nginx configured to connect to thephp-fpm backend.With the new webmail system I can login, read mails, and even arewriting this email from. The first login to it fails. The secondsucceeds. It is not behind the reverse proxy (as it is not fully readyyet for access from the outside (DSL with NAT on the DSL-box to thereverse proxy)), but a single nginx with php-fpm backend (instead of 2nginx + php-fpm as in the old webmail).

The wiki behind the reverse proxy is sometimes working, and sometimesnot. Sometimes it is providing everything, sometimes parts of the siteis missing (e.g. pictures / icons). Sometimes there is simply a blankpage, sometimes it gives an error message from the wiki about anunforseen bug...

The error messages in the nginx reverse proxy log for all the strangefailure cases is "accept4() failed (53: Software caused connectionabort)". Sometimes I get "upstream timed out". When it times out in thereverse proxy instead of getting the accept4-errors, I see the sameaccept4-error message in the nginx inside the wiki or webmail jailinstead.

I tried to recompile all the components of the wiki and reverse proxyand php81 based webmail, to no avail. The issue persists.

Does this ring a bell to someone? Maybe some network or socket or VMbased changes in this timeframe which smell like they could be relatedand maybe good candidates for a backup-test? Any ideas how to drill downwith debugging to have a more simple test-case than the complex setup ofif_bridge, epair, jails, wiki, php, nginx, ...?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF

Strange network issues with -current

Reply via email to