Hi,

since a while I have some strange network issues in some parts of a particular system.

A build with src from 2023-07-26 was still working ok. An update to 2023-08-07 broke some parts in a strange way. I tried again with src from 2023-08-11 didn't fix things.

What I see is... strange and complex.

I have a jail host with about 23 jails. All the jails are sitting on a bridge, and have IPv6 and IPV4 addresses. One jail is a DNS server for a domain which contains all the DNS entries for all the jails on the system (and more). Other jails have mysql (FS socket for mysql nullfs-mounted into other jails for connecting to mysql via the FS socket instead of the network), dovecot IMAP server, postfix SMTP server, a nginx based reverse proxy and 2 different kinds of webmail solutions (old php74 based on the way out on favour for a php81 based one), a wiki and other things.

With the old working basesystem I can login into the old webmail system and read mails. With the newer non-working basesystem I still can login, but the auth-credentials are not stored in the backend-session and as such no mail is listed at all, as this requires subsequent connections from php to dovecot. This webmail system is going via the reverse proxy to the webmail-jail which has another nginx configured to connect to the php-fpm backend. With the new webmail system I can login, read mails, and even are writing this email from. The first login to it fails. The second succeeds. It is not behind the reverse proxy (as it is not fully ready yet for access from the outside (DSL with NAT on the DSL-box to the reverse proxy)), but a single nginx with php-fpm backend (instead of 2 nginx + php-fpm as in the old webmail).

The wiki behind the reverse proxy is sometimes working, and sometimes not. Sometimes it is providing everything, sometimes parts of the site is missing (e.g. pictures / icons). Sometimes there is simply a blank page, sometimes it gives an error message from the wiki about an unforseen bug...

The error messages in the nginx reverse proxy log for all the strange failure cases is "accept4() failed (53: Software caused connection abort)". Sometimes I get "upstream timed out". When it times out in the reverse proxy instead of getting the accept4-errors, I see the same accept4-error message in the nginx inside the wiki or webmail jail instead.

I tried to recompile all the components of the wiki and reverse proxy and php81 based webmail, to no avail. The issue persists.

Does this ring a bell to someone? Maybe some network or socket or VM based changes in this timeframe which smell like they could be related and maybe good candidates for a backup-test? Any ideas how to drill down with debugging to have a more simple test-case than the complex setup of if_bridge, epair, jails, wiki, php, nginx, ...?

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF

Reply via email to