Hello everyone, we are running 4 * Dell R430 for firewalling, NAT, accounting etc. for a student network (approx. 5.200 users). We use pf and authpf. Server 1 and 2 form a carp-cluster as well as server 2 und 3. All boxes come with identical hardware and software configuration. The only difference is, that cluster A runs 6.7 and cluster B openbsd 7.0.
Every user (-> student) on the network has it's own individual login (directly doing ssh to one of the boxes) to open up a connection to the internet. The user database on server 1 und 2 carries approx 2.600 users, the user database on cluster B the other half. The creation and updating of user information is scripted. Most of the time we just need to update authpf.message to show traffic consumption to the students on login: echo "* UPD (183883)" echo "---\n\nWelcome to studNET!\n\nYou have a maximum of 600 GB traffic available per month.\nYou have already used 9.231 GB in the current month (calculated at 2022-08-08 21:02:07) [.....] .\n\n---" >/etc/authpf/users/183883/authpf.message || error_handler echo "... authpf-file /etc/authpf/users/183883/authpf.message generated" if [ $USER_ERROR -eq 0 ] then echo "* UPD (183883|dummyuser, dummyuser) ... success" else echo "* UPD (183883| dummyuser, dummyuser) ... failed" fi This chunk of code is repeated maybe 2.000 times, generated twice a day to a script file and run by cron. *Problem* Maybe once a month server 3 or 4 crash - they just freeze. Sometimes a reboot helps but often it additionaly comes along with a corrupt user database (system wont start, user root not found). If this happens we manually have to recover a working master.passwd and apply pwd_mkdb. As the systems freeze there are no helping log entries or something similar. The only thing for sure is, that *when* it happens its always *after* the script ran and until now it never happend on server 1 or 2 (6.7). *Question* As the problem surely seems to be caused by the exectution of the script the question is why this happens? Heaavy IO or some bug with the hard disk driver? Does someone of you have a clue why the system crashes and even the user database gets corrupted in our setup?! Best regards, Martin Miethe

