Re: accf_http and incqlen
Scott Oertel wrote: (I sent this to freebsd-questions, but I didn't receive any replies, thought I would try my luck here) I setup the http accept filter with apache and I was having a hard time understanding this, maybe you guys could help out. I've tested this among various version of freebsd, primarily FreeBSD 6.3-RELEASE, and with various apache configs, and it appears to behave the same across the board. So why is it that it appears that the TCP connections never terminate, just stay in a state of ESTABLISHED, and why doesn't this queue ever flush itself, is it normal, if it is, what happens exactly when the queue fills up to maxqlen. From the netstat output below, you can see that the incqlen is maxed out. I've done quite a bit of searching regarding this queue but haven't found any real solid information which describes what happens when it fills up, and at the same time this is going on, I have 517 established connections to port 80. ]# netstat -an|grep \.80|grep ESTAB|wc -l 519 [...] Last time I looked (in FreeBSD 4.x) these were connections that got stuck in an early stage, that is, before the HTTP request had been received. The 'accf_http' filter which wants to parse said request waits forever in this situation because there is no timeout implemented, as far as I recall. So these would-be HTTP connections pile up over time. The actual cause are quite likely port scans and such from the Internet. I don't know whether one would eventually run out of resources, but so many stuck connections certainly look sick, and you can't see the wood for the trees if you need to debug something under these circumstances. What I did instead was compile Apache 1.3 with the flag -DACCEPT_FILTER_NAME=\dataready\ added to CFLAGS in the ports repository's Makefile. This way Apache uses the 'dataready' filter instead of 'httpready'. This doesn't cause any stuck connections, and it improves the performance as well because most modern browsers and proxies send the HTTP request plus the whole set of headers in a single data packet anyway, which means that unconditionally returning from accept(2) on the first data packet received is sufficient. Under these circumstances the overhead of parsing the HTTP request in the kernel, like the 'httpready' filter does, no longer makes much sense. I haven't looked at Apache 2.x so far in this regard. Perhaps there is a similar compile time option. In any case, maybe this tweak helps in your case, too. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: SMP on FreeBSD 6.x and 7.0: Worth doing?
Mike Tancsa wrote: At 08:11 AM 12/26/2007, Scott Long wrote: How does one know if the vfs.ufs.dirhash_maxmem is set too high and are exhausting KVA ? Panics, freezes, failure to exec new programs, failure to create pipes, etc. Is there anyway to know ahead of time one is getting close to the stage where all those bad things start to happen ? At least on FreeBSD 4.11 you can do sysctl -a|grep kvm and get something like this: vm.kvm_size: 1065353216 vm.kvm_free: 348127232 Perhaps this works on later versions of FreeBSD, too. Now, if vm.kvm_free drops to 10% or so of vm.kvm_size and continues to fall, and vfs.ufs.dirhash_mem still hasn't hit the vfs.ufs.dirhash_maxmem limit, it's time to get concerned. Of course, you can also use the vm.kvm_* values to dimension vfs.ufs.dirhash_maxmem properly in the first place. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Also seeing 2 x quad-core system slower that 2 x dual core
Andreas Pettersson wrote: Claus Guttesen wrote: could either replace my 10K rpm drives (in raid 1+0) with 15K ditto which would require a downtime which we could not afford at this tim I have several times successfully upgraded mirrored volumes with new disks without any downtime at all. Just change one disk, let the mirror rebuild, change the other disk, wait for rebuild again, tell the logical drive to present all the new space and then extend the filesystem. No downtime. Just an additional hint: Before you start doing this procedure, in order to minimize risk you may want to do a verification/repair run over the original mirror (if your controller supports this) to make sure that both disks are in sync and there are no defective sectors on the disk you are subsequently copying the data from. Otherwise there could be some rude awakening ... Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 4.x Collecting pv entries Suggest increasing PMAP_SHPGPERPROC,
Stephen Clark wrote: Claus Guttesen wrote: Stephen Clark wrote: I know FreeBSD 4.x is old..., but we are using on a production system with postgres and apache. The above message is appearing periodically. I googled for the message but found no recommendation for adjusting it. Is the sysctl kern.vm.pmap.shpgperproc available on 4.x? This can be configured in /boot/loader.conf. It does not appear to be available via a sysctl in 4.x. But you can put that option into the kernel config file: options PMAP_SHPGPERPROC=... and build a new kernel with it. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 4.x Collecting pv entries Suggest increasing PMAP_SHPGPERPROC,
Hi Steve, Stephen Clark wrote: Uwe Doering wrote: [...] But you can put that option into the kernel config file: options PMAP_SHPGPERPROC=... and build a new kernel with it. You are correct. My question is more how much should I increase it. The current default in the 4.x LINT file is options PMAP_SHPGPERPROC=201 should I double it? Increase it by 50 to 251? Should it be a prime number? I am just asking for information. The settings in the LINT file are not necessarily the defaults, rather mere examples. The default value for PMAP_SHPGPERPROC is 200 (defined in the source file /usr/src/sys/i386/i386/pmap.c). I would recommend to increase it by 100 until the messages stop. We use 300 on our servers, for instance. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: buildworld fails after patch (FreeBSD-SA-06:23.openssl)
Ruslan Ermilov wrote: On Fri, Sep 29, 2006 at 09:21:56PM +0200, Uwe Doering wrote: Ruslan Ermilov wrote: It doesn't matter. What you suggest is not the correct way. Perhaps the buildworld is broken, but that's a separate issue. My understanding so far is that the files under '/usr/include' don't get touched until I run 'installworld'. So the 'buildworld' universe has to be self-contained. That's what I was trying to point out. Yes, they are not touched. During buildworld, a special version of the compiler is built that looks headers up in the temporary location, normally /usr/obj/usr/src/tmp/usr/include. Then all (new) headers are installed there, then new libraries are built, then all the rest. If buildworld touched /usr/include, you could easily end up with a partially upgraded system, e.g. if build failed in the middle. If it still fails for you (the buildworld), please collect and put the full combiled stdout + stderr output from running make buildworld available somewhere for download and analysis. Colin said he did build all worlds, on all patched branches. Unfortunately I can no longer reproduce the error because I fixed the problem by hand, as pointed out above. Sorry. OK, you had 4.11 and what were you trying to build? RELENG_4? So I can try to reproduce the problem here. Yes, I use RELENG_4. Thanks for your help. Worked for me building fresh RELENG_4: : uname -srm : FreeBSD 4.10-RELEASE i386 : tail -3 build.log : rm -f freebsd.submit.cf : m4 -D_CF_DIR_=/spool/ru_tmp/src/etc/sendmail/../../contrib/sendmail/cf/ /spool/ru_tmp/src/etc/sendmail/../../contrib/sendmail/cf/m4/cf.m4 /spool/ru_tmp/src/etc/sendmail/freebsd.submit.mc freebsd.submit.cf : chmod 444 freebsd.submit.cf : Thanks for testing it. So this problem seems to be specific to my workstation. If it happens again I'll investigate it more thoroughly. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: buildworld fails after patch (FreeBSD-SA-06:23.openssl)
Christer Solskogen wrote: FreeBSD 6.1-RELEASE-p3 amd64 /usr/bin/gcc -O1 -pipe -march=nocona -DTERMIOS -DANSI_SOURCE -I/usr/src/secure/lib/libcrypto/../../../crypto/openssl -I/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto -I/files3/build/obj/usr/src/secure/lib/libcrypto -DOPENSSL_THREADS -DOPENSSL_NO_IDEA -DNO_IDEA -c /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:81: error: `DH_R_MODULUS_TOO_LARGE' undeclared here (not in a function) /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:81: error: initializer element is not constant /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:81: error: (near initialization for `DH_str_reasons[1].error') /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:81: error: initializer element is not constant /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:81: error: (near initialization for `DH_str_reasons[1]') /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:82: error: initializer element is not constant /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:82: error: (near initialization for `DH_str_reasons[2]') /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:83: error: initializer element is not constant /usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dh/dh_err.c:83: error: (near initialization for `DH_str_reasons[3]') *** Error code 1 Stop in /usr/src/secure/lib/libcrypto. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. The patch was applied using cvsup. The same happened on my workstation, which runs 4.11. The cause of this problem is that the openssl sources under '/usr/src' apparently use some include files installed under '/usr/include/openssl' instead of those in the '/usr/src' tree. The fix for me was to copy the '*.h' files that changed into '/usr/include/openssl' by hand. Afterwards things worked as expected. This is of course just a workaround. The proper fix would be to modify the respective makefiles to add all the directories where there are header files to the list of include directories given to 'cc' with '-I' options. This apparently hasn't been done so far. At least not completely. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: buildworld fails after patch (FreeBSD-SA-06:23.openssl)
Ruslan Ermilov wrote: On Fri, Sep 29, 2006 at 05:40:36PM +0200, Uwe Doering wrote: [...] The same happened on my workstation, which runs 4.11. The cause of this problem is that the openssl sources under '/usr/src' apparently use some include files installed under '/usr/include/openssl' instead of those in the '/usr/src' tree. The fix for me was to copy the '*.h' files that changed into '/usr/include/openssl' by hand. Afterwards things worked as expected. This is of course just a workaround. The proper fix would be to modify the respective makefiles to add all the directories where there are header files to the list of include directories given to 'cc' with '-I' options. This apparently hasn't been done so far. At least not completely. No. The correct way is to either do a full build (aka buildworld), or a partial build by first installing headers, and then doing the library build. ?? Did you notice the subject of this thread? The problem occured while running 'buildworld'. My understanding so far is that the files under '/usr/include' don't get touched until I run 'installworld'. So the 'buildworld' universe has to be self-contained. That's what I was trying to point out. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: buildworld fails after patch (FreeBSD-SA-06:23.openssl)
Ruslan Ermilov wrote: On Fri, Sep 29, 2006 at 08:34:29PM +0200, Uwe Doering wrote: Ruslan Ermilov wrote: On Fri, Sep 29, 2006 at 05:40:36PM +0200, Uwe Doering wrote: [...] The same happened on my workstation, which runs 4.11. The cause of this problem is that the openssl sources under '/usr/src' apparently use some include files installed under '/usr/include/openssl' instead of those in the '/usr/src' tree. The fix for me was to copy the '*.h' files that changed into '/usr/include/openssl' by hand. Afterwards things worked as expected. This is of course just a workaround. The proper fix would be to modify the respective makefiles to add all the directories where there are header files to the list of include directories given to 'cc' with '-I' options. This apparently hasn't been done so far. At least not completely. No. The correct way is to either do a full build (aka buildworld), or a partial build by first installing headers, and then doing the library build. ?? Did you notice the subject of this thread? The problem occured while running 'buildworld'. It doesn't matter. What you suggest is not the correct way. Perhaps the buildworld is broken, but that's a separate issue. My understanding so far is that the files under '/usr/include' don't get touched until I run 'installworld'. So the 'buildworld' universe has to be self-contained. That's what I was trying to point out. Yes, they are not touched. During buildworld, a special version of the compiler is built that looks headers up in the temporary location, normally /usr/obj/usr/src/tmp/usr/include. Then all (new) headers are installed there, then new libraries are built, then all the rest. If buildworld touched /usr/include, you could easily end up with a partially upgraded system, e.g. if build failed in the middle. If it still fails for you (the buildworld), please collect and put the full combiled stdout + stderr output from running make buildworld available somewhere for download and analysis. Colin said he did build all worlds, on all patched branches. Unfortunately I can no longer reproduce the error because I fixed the problem by hand, as pointed out above. Sorry. OK, you had 4.11 and what were you trying to build? RELENG_4? So I can try to reproduce the problem here. Yes, I use RELENG_4. Thanks for your help. Regards, Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: ffs_valloc: dup alloc
Eric Anderson wrote: I get the above panic after nfs clients attach to this nfs server and being read/write ops on it after an unclean shutdown. I've fsck'ed the fs, and it marks it as clean, but I get this every time. It's an NFS share of a GEOM stripe (about 2TB). mode = 0100600, inum = 58456203, fs = /mnt panic: ffs_valloc: dup alloc Do you happen to have disk mirroring on this server (RAID 1)? At work, on a workstation with RAID 1, we once had a case where after a power failure fsck would succeed, but subsequently, when mounting and using the partitions, the kernel still paniced because of a corrupt filesystem. Repeatedly. This caused some major head scratching on our part until we figured out what was happening. The mirrored disks had gone out of sync. For performance reasons, a RAID 1 controller reads data from one disk drive or the other, depending on which drive is less busy in that particular moment. So while fsck was able to find and fix some filesystem inconsistencies there were still some more left in disk sectors it didn't access. The RAID controller we used turned out to have a verification mode where it would scan the disks and re-synchronize them. Afterwards we did another fsck run, and this fixed the remaining filesystem inconsistencies. The kernel panics were gone. Now, with the information you've provided I can't tell whether these findings apply to your case, but perhaps this story helps at least others in a similar situation. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: OpenVPN within a Jail under 6.x ...
Oliver Fromme wrote: Uwe Doering [EMAIL PROTECTED] wrote: [...] Now, since routes are a global resource in FreeBSD, is there a way to prevent users from other jails on that machine from accessing that VPN, too? If it weren't possible to restrict access to a VPN to the jail it is associated with the VPN would no longer be private I'd think. Every jail has its own IP address. Connections originating from a jail are forced to use the jail's IP address as their source address. Therefore you can use a packet filter (IPFW or PF) to control where those packets are allowed to go. [...] Thanks for pointing that out. I must admit that I hadn't thought this through very thoroughly. Now that you mention the fixed nature of a jail's IP address it is kind of obvious that you can filter on the source address. However, I believe there is still a snag. People tend to pick the same IP networks from the range of official private IP addresses for their internal LANs. If you wanted to set up VPN tunnels to these LANs for a larger number of jails belonging to individual owners there is some likelihood that the routes to these LANs would overlap. That is, since you cannot _route_ based on the source address of IP packets, at some point you would have a clash of interests between two or more owners of said jails. As the administrator of the machine that carries these jails you would ultimately have to take a decision on who can have a VPN tunnel and who not. Provided my analysis is correct this would mean that the approach of using just a packet filter for access control doesn't scale very well. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: OpenVPN within a Jail under 6.x ...
Oliver Fromme wrote: Marc G. Fournier wrote: Oliver Fromme wrote: The problem is that you need to configure interfaces (tun(4) or tap(4)) to set up the VPN, but ifconfig(8) does not work inside a jail. That means you cannot set up a VPN inside a jail. However, you can _use_ it within a jail, of course, if you assign the IP of the VPN connection to the jail 'k, how would you do that? I thought you could only assign one IP to a jail, both in 4.x and 6.x? True. I meant that the IP of the VPN connection is the only IP of the jail. Or, if you can't do that, forward the packets into the jail using IPFW FWD rules and NAT. In that case, the jail doesn't need to have the VPN connection's IP. In fact, you can set the IP of the jail to a localnet IP (such as 127.0.1.1), which isn't routable and isn't accessible from the outside at all. That's often done to improve security. Talking about security, while I haven't worked with VPNs so far I believe that there needs to be a route installed in order to forward packets to the remote end of the VPN connection. Now, since routes are a global resource in FreeBSD, is there a way to prevent users from other jails on that machine from accessing that VPN, too? If it weren't possible to restrict access to a VPN to the jail it is associated with the VPN would no longer be private I'd think. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 4.8 alternate system clock has died error
Charles Sprickman wrote: Hello all, I've been digging through Google for more information on this. I have a 4.8 box that's been up for about 430 days. In the last week or so, top and ps have started reporting all CPU usage numbers as zero, and running systat -vmstat results in the message The alternate system clock has died! Reverting to ``pigs'' display. I've found instances of this message in the archives for some 3.x users, some pre 4.8 users and some 5.3 users. There were a number of suggestions including a patch if pre-4.8, sending init a HUP, and setting the following sysctl mib: kern.timecounter.method: 1. I'm already at 4.8-p24, so I did not look into patching anything, and HUP'ing init and setting the sysctl mib does not seem to have any effect. I'm not quite ready to believe that some hardware has actually failed. Perhaps due to the long uptime something has rolled over? We had this once at work, quite a while ago. The alternate system clock is in fact the Real Time Clock (RTC) on the mainboard. In our case we were lucky in that it was just the quartz device that failed due to an improperly soldered lead which finally came off. We fixed the soldering and the problem was gone. Now, there are of course plenty of other hardware reasons why the RTC can fail, even temporarily like in your case. Perhaps it is really time for a new mainboard. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 4.8 alternate system clock has died error
Charles Sprickman wrote: On Fri, 18 Nov 2005, Uwe Doering wrote: Charles Sprickman wrote: I've been digging through Google for more information on this. I have a 4.8 box that's been up for about 430 days. In the last week or so, top and ps have started reporting all CPU usage numbers as zero, and running systat -vmstat results in the message The alternate system clock has died! Reverting to ``pigs'' display. [...] We had this once at work, quite a while ago. The alternate system clock is in fact the Real Time Clock (RTC) on the mainboard. In our case we were lucky in that it was just the quartz device that failed due to an improperly soldered lead which finally came off. We fixed the soldering and the problem was gone. Are there any tools to verify that the RTC is working? systat -vmstat will show you the interrupt that it drives. In our case it's irq8, which is in fact labeled rtc. It is supposed to run at 128 Hz. Under load it can drop to some lower value. This is normal. I don't exactly understand what the RTC is, but would the machine not be suffering some other problems if there was an actual hardware failure? Doesn't the system rely on this to time everything from the processors to memory to PCI slots and interrupts? No, the RTC drives only the interrupt that is responsible for collecting the CPU usage data. When it fails the CPU usage in top, ps etc. just drops to zero, as you've observed, but the server continues to run. If the failure is permanent the machine refuses to boot, though. At least that's what happened in our case. Apparently the RTC chip is essential to the mainboard's boot sequence. For instance, the initial date and time information comes from this chip. On the other hand, if a reset corrects the problem then the RTC chip probably got hung, or there is a problem with the interrupt controller it is connected to. On a properly working mainboard this shouldn't happen, of course. Is there any simple way to figure out if this is hardware or software? I don't know of any. However, we run FreeBSD almost since 4.0, on various mainboards, UP and SMP, and we've never seen these symptoms but in this one case mentioned above. So I suppose it's not a kernel bug. I haven't looked at the PR database, though. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Jail to jail network performance?
Brandon Fosdick wrote: Robert Watson wrote: (1) Modifying the name space exclusion assumption for jails, so that the file system name spaces overlap. One way to do this is with nullfs. nullfs looks interesting. I was thinking about sharing files between jails using NFS, but it looks like nullfs would do the trick with better performance. Although the bugs section of the man page for mount_nullfs is rather scary. Does anyone have any experience with it? Does it actually work? If the point here is to make /tmp/mysql.sock show up in another jail's file space, can I use a symlink instead? Can a jailed process see the target of the symlink? Symlinks are just a path mapping mechanism performed by the kernel at lookup time, that is, before the actual access. In a jail only those parts of a filesystem are visible that are at or below the jail's root directory. The same goes for normal chroots. So if the symlink points to a location outside this scope you cannot access the object. Hardlinks would work, but only if the jails concerned live in the same filesystem. Though they can of course be confined in separate, non-overlapping parts of that filesystem. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Jail to jail network performance?
Brandon Fosdick wrote: I have a 5.4-S box running apache2 that's serving data from mysql running on the same box. I'm thinking about putting both in seperate jails, partly for security and partly for practice. Would this impact network performance between the two? Currently the mysql connection is using localhost which I understand to be faster than a network socket. Does jail-to-jail traffic use the same mechanism? or something else? In MySQL 'localhost' is a hard-wired shortcut that uses domain sockets instead of TCP sockets. Since domain sockets live in the namespace of a filesystem this requires that both server and client have access to the same filesystem. Now, for security reasons jails normally are confined in separate filesystems, or at least in separate parts of a common one. So in case of MySQL you would have to use TCP sockets to communicate between jails. This socket type typically consumes more CPU because of TCP's protocol overhead. However, whether you would actually notice any difference in speed basically depends on how much excess CPU power there is available on that server. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Jail to jail network performance?
Robert Watson wrote: On Wed, 14 Sep 2005, Lyndon Nerenberg wrote: On Sep 13, 2005, at 11:59 PM, Uwe Doering wrote: Now, for security reasons jails normally are confined in separate filesystems, or at least in separate parts of a common one. So in case of MySQL you would have to use TCP sockets to communicate between jails. This socket type typically consumes more CPU because of TCP's protocol overhead. However, whether you would actually notice any difference in speed basically depends on how much excess CPU power there is available on that server. Ignoring security (or filesystem namespace issues) I will just note that using named sockets for local IPC is a Good Thing. When I worked at Messaging Direct I taught sendmail to speak LMTP over named sockets, and our local delivery rate (to our IMAP server) went up by a factor of 10. It would be really cool if we could figure out a way to do AF_UNIX between jails, but I confess to not having thought about any of the implications ... (Maybe netgraph can help here?) There are several ways you can do it, but they generally fall into two classes of activies: (1) Modifying the name space exclusion assumption for jails, so that the file system name spaces overlap. One way to do this is with nullfs. (2) Having a daemon or tool that runs outside of the jail and brokers communication between the jails. One example might be a daemon that inserts a UNIX domain socket into both jails and then provides references to shared IPC objects between the two by request. Another example might be a daemon or tool that responds to a request and creates a hard link from a socket/fifo endpoint visible in one jail to a name visible in another jail, perhaps when setting up the jail. The former requires more infrastructure, but the latter is less flexible. Just a kind reminder to those interested in implementing the daemon approach: Never ever create or write to an object from outside a jail that is located in a part of the filesystem that a live jail can access and modify. Otherwise you may easily fall victim to a symlink attack or similar. Remember that jails set up for security reasons generally are to be considered enemy territory. The correct approach would be to create or open such objects from a chrooted child process. There is only one exception: In the pre-boot phase of a jail you can get away with checking the file path component by component before you touch the object. But as soon as the jail runs the window between checking the path and accessing the object can be exploited from inside the jail. Hope to have helped prevent some rude awakening for some. ;-) Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kernel: swap_pager: indefinite wait buffer - on 5.3-RELEASE-p5
Oliver Fromme wrote: Uwe Doering [EMAIL PROTECTED] wrote: Oliver Fromme wrote: If they're really identical (i.e. the same size and same geometry), then you can use dd(1) for duplication, like this: # dd if=/dev/ad0 of=/dev/ad1 bs=64k conv=noerror,sync The noerror,sync part is important so the dd command will not stop when it hits any bad spots on the source drive and instead will fill the blocks with zeroes on the destination drive. Since it's only the swap partition, you shouldn't lose any data. I would like to point out that the conclusion you're drawing in the last sentence is invalid IMHO. I'm afraid I don't agree. indefinite wait buffer messages at apparently random block numbers just indicate that the pager was unable to access the swap area (in its entirety!) when it wanted to. It means that the disk drive was either dead at that point in time or busy trying to deal with a bad sector. This sector could have been anywhere on the disk. It just kept the disk drive busy for long enough that the pager started to complain. The OP specifically said that the swap_pager messages were the only kernel messages that he got. That indicates that only the swap partition is affected, because otherwise there would have been other kernel messages indicating I/O errors from one of the filesystems on that disk. Your assumption here is that the filesystem code would become impatient, too. This in not the case. The swap pager has a timeout built in (20 seconds IIRC) after which it prints a warning message and continues waiting, but there is nothing like this in the filesystem code. If the disk drive is dead or busy trying to deal with a bad sector in a filesystem the kernel will wait silently and indefinitely until either the disk drive succeeds in recovering the sector, or it fails to do so. In the latter case the kernel would log an I/O error. But only when it hears back from the disk drive and not any earlier, in contrast to the swap pager. That's why you often see only swap pager messages in case of a dying disk drive. I checked the kernel sources, but of course I could have missed the relevant lines. In this case I would appreciate a pointer to the place at which the filesystem code generates a warning message comparable to that from the swap pager. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kernel: swap_pager: indefinite wait buffer - on 5.3-RELEASE-p5
Oliver Fromme wrote: Zoltan Frombach [EMAIL PROTECTED] wrote: Apr 29 02:10:14 www kernel: swap_pager: indefinite wait buffer: device: ad0s1a, blkno: 328636, size: 8192 Apr 29 02:10:24 www kernel: swap_pager: indefinite wait buffer: device: ad0s1e, blkno: 329842, size: 4096 [...] The error message indicates that there was an I/O error accessing the swap area on your disk. Usually that's an indication for a hardware failure, e.g. a dying disk. I happen to have an identical hard drive around here, unused. If I hook it up as a slave (IDE) drive, is there a way I can mirror the dying drive to the spare one (with all partitions, etc, intact)? If they're really identical (i.e. the same size and same geometry), then you can use dd(1) for duplication, like this: # dd if=/dev/ad0 of=/dev/ad1 bs=64k conv=noerror,sync The noerror,sync part is important so the dd command will not stop when it hits any bad spots on the source drive and instead will fill the blocks with zeroes on the destination drive. Since it's only the swap partition, you shouldn't lose any data. I would like to point out that the conclusion you're drawing in the last sentence is invalid IMHO. indefinite wait buffer messages at apparently random block numbers just indicate that the pager was unable to access the swap area (in its entirety!) when it wanted to. It means that the disk drive was either dead at that point in time or busy trying to deal with a bad sector. This sector could have been anywhere on the disk. It just kept the disk drive busy for long enough that the pager started to complain. Since the swap area is usually just a minor portion of the disk it is therefore much more likely that the bad sector is located in a filesystem. So if you copy the disk and ignore i/o errors in this situation you _do_ run a very real risk of losing data! Unfortunately you can't do much about it but you should at least be aware of it. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Don Bowman wrote: From: Uwe Doering [mailto:[EMAIL PROTECTED] ... As far as I understand this family of controllers the OS drivers aren't involved at all in case of a disk drive failure. It's strictly the controller's business to deal with it internally. The OS just sits there and waits until the controller is done with the retries and either drops into degraded mode or recovers from the disk error. That's why I initially speculated that there might be a timeout somewhere in PostgreSQL or FreeBSD that leads to data loss if the controller is busy for too long. A somewhat radical way to at least make these failures as rare an event as possible would be to deliberately fail all remaining old disk drives, one after the other of course, in order to get rid of them. And if you are lucky the problem won't happen with newer drives anyway, in case the root cause is an incompatibility between the controller and the old drives. Started that yesterday. I've got one 'old' one left. Sadly, the one that failed night before last was not one of the 'old' ones, so this is no guarantee :) From the raidutil -e log, I see this type of info. I'm not sure what the 'unknown' events are. The 'CRC Failure' is probably the problem? There's also Bad SCSI Status, unit attention, etc. Perhaps the driver doesn't deal with these properly? In my opinion what the log shows in this case is internal communication between the controller and the disk drives. The OS driver is not involved. In the past I've seen CRC errors like these as a result of bad cabling or contact problems. You may want to check the SCSI cables. They have to be properly terminated and there must not be any sharp kinks given the signal frequencies involved these days. Also, pluggable drive bays can cause this. Every electrical contact is a potential source of trouble. Finally, faulty or overloaded power supplies can cause glitches like these. This can be especially hard to debug. When these hardware issues have been taken care of you may want to start a RAID verification/correction run. If it shows any inconsistencies this may be an indication of former hardware glitches. I'm not sure whether you can trigger that process through 'raidutil'. I've always used the X11 'dptmgr' program. You can terminate it after having started the verification. It continues to run in the background (inside the controller). Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Don Bowman wrote: From: [EMAIL PROTECTED] From: Uwe Doering [mailto:[EMAIL PROTECTED] ... Did you merge 1.3.2.3 as well? This actually should have been one MFC Yes, merged from RELENG_4. I will post later if this happens again, but it will be quite a long time. The machine has 7 drives in it, there are only 3 ones left old enough they might fail before I take it out of service (it originally had 7 1999-era IBM drives, now it has 4 2004-era seagate drives and 3 of the old IBM's. The drives have been in continuous service, so they've lead a pretty good life!) Thanks for the suggestion on the cam timeout, I've set that value. Another drive failed and the same thing happened. After the failure, the raid worked in degrade mode just fine, but many files had been corrupted during the failure. So I would suggest that this merge did not help, and the cam timeout did not help either. This is very frustrating, again I rebuild my postgresql install from backup :( This is indeed unfortunate. Maybe the problem is in fact located neither in PostgreSQL nor in FreeBSD but in the controller itself. Does it have the latest firmware? The necessary files should be available on Adaptec's website, and you can use the 'raidutil' program under FreeBSD to upload the firmware to the controller. I have to concede, however, that I never did this under FreeBSD myself. If I recall correctly I did the upload via a DOS diskette the last time. If this doesn't help either you could ask Adaptec's support for help. You need to register the controller first, if memory serves. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Don Bowman wrote: From: Uwe Doering [mailto:[EMAIL PROTECTED] Don Bowman wrote: [...] Another drive failed and the same thing happened. After the failure, the raid worked in degrade mode just fine, but many files had been corrupted during the failure. So I would suggest that this merge did not help, and the cam timeout did not help either. This is very frustrating, again I rebuild my postgresql install from backup :( This is indeed unfortunate. Maybe the problem is in fact located neither in PostgreSQL nor in FreeBSD but in the controller itself. Does it have the latest firmware? The necessary files should be available on Adaptec's website, and you can use the 'raidutil' program under FreeBSD to upload the firmware to the controller. I have to concede, however, that I never did this under FreeBSD myself. If I recall correctly I did the upload via a DOS diskette the last time. If this doesn't help either you could ask Adaptec's support for help. You need to register the controller first, if memory serves. The latest firmware bios is in the controller (upgraded the last time I had problems). Tried adaptec support, controller is registered. The problem is definitely not in postgresql. Files go missing in directories that are having new entries added (e.g. I lost a 'PG_VERSION' file). Data within the postgresql files becomes corrupt. Since the only application running is postgresql, and it reads/writes/fsyncs the data, its not unexpected that it's the one that reaps the 'rewards' of the failure. I have to believe this is either a bug in the controller, or a problem in cam or asr. As far as I understand this family of controllers the OS drivers aren't involved at all in case of a disk drive failure. It's strictly the controller's business to deal with it internally. The OS just sits there and waits until the controller is done with the retries and either drops into degraded mode or recovers from the disk error. That's why I initially speculated that there might be a timeout somewhere in PostgreSQL or FreeBSD that leads to data loss if the controller is busy for too long. A somewhat radical way to at least make these failures as rare an event as possible would be to deliberately fail all remaining old disk drives, one after the other of course, in order to get rid of them. And if you are lucky the problem won't happen with newer drives anyway, in case the root cause is an incompatibility between the controller and the old drives. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Don Bowman wrote: From: Uwe Doering [mailto:[EMAIL PROTECTED] Don Bowman wrote: I have a machine running: $ uname -a FreeBSD machine.phaedrus.sandvine.com 4.9-STABLE FreeBSD 4.9-STABLE #0: Fri Mar 19 10:39:07 EST 2004 [EMAIL PROTECTED]:/usr/src/sys/compile/LABDB i386 ... I have merged asr.c from RELENG_4 to get this fix: Fix a mis-merge in the MFC of rev. 1.64 in rev. 1.3.2.3; the following change wasn't included: - Set the CAM status to CAM_SCSI_STATUS_ERROR rather than CAM_REQ_CMP in case of a CHECK CONDITION. since I guess its conceivable this could cause my problem. I have to admit that I didn't think of this right away, even though I was kind of involved. Did you merge 1.3.2.3 as well? This actually should have been one MFC but it was done in two steps due to an oversight. Please let us know whether the fix makes any difference in your case. Its author made it for CD burners and wasn't sure whether it has any effect on other devices, like da(4). Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Uwe Doering wrote: Don Bowman wrote: I have merged asr.c from RELENG_4 to get this fix: Fix a mis-merge in the MFC of rev. 1.64 in rev. 1.3.2.3; the following change wasn't included: - Set the CAM status to CAM_SCSI_STATUS_ERROR rather than CAM_REQ_CMP in case of a CHECK CONDITION. since I guess its conceivable this could cause my problem. I have to admit that I didn't think of this right away, even though I was kind of involved. Did you merge 1.3.2.3 as well? This actually should have been one MFC but it was done in two steps due to an oversight. Please let us know whether the fix makes any difference in your case. Its author made it for CD burners and wasn't sure whether it has any effect on other devices, like da(4). Memory's coming back piecemeal. ;-) There's another thing you could try. The 'asr' driver's original timeout is 360 seconds, because its author knew that this type of controller can be busy for quite some time. FreeBSD's SCSI driver, however, sets it to its default of 60 seconds, which can be way too short. What happens when the controller is busy trying to deal with a failed disk is that the 'asr' driver sends a bus reset to the controller as a whole, due to the short timeout. You should be able to see this clash in the controller's event log. My feeling is that this collision of events may have ill effects, like the data corruption you've observed. On our machines we've set the SCSI timeout and thereby also the 'asr' driver's timeout back to the original 360 seconds, in order to leave the controller alone while it is busy. There is a 'sysctl' variable for this: kern.cam.da.default_timeout=360 Maybe that's the actual fix for your problem. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
Don Bowman wrote: I have a machine running: $ uname -a FreeBSD machine.phaedrus.sandvine.com 4.9-STABLE FreeBSD 4.9-STABLE #0: Fri Mar 19 10:39:07 EST 2004 [EMAIL PROTECTED]:/usr/src/sys/compile/LABDB i386 It has an adaptec 3210S raid controller running a single raid-5, and runs postgresql 7.4.6 as its primary application. 3 times now I have had a drive fail, and have had corrupted files in the postgresql cluster @ the same time. The time is too closely correlated to be a coincidence. It passes fsck @ the time that I got to it a couple of hours later, and the filesystem seems to be ok (with a failed drive, the raid in 'degrade' mode). It appears that the drive failure and the postgresql failure occur @ exactly the same time (monitoring with nagios, within 1hr accuracy). It would appear that for some file(s) bad data was returned. Does anyone have any suggestions? In my experience, in a situation like this RAID controllers can block the system for up to a couple of minutes, trying to revive a failed disk drive by sending it bus reset commands and the like, until they eventually give up and drop into degraded mode. With sufficiently patient applications this is no problem, but if a program runs into internal timeouts during this period of time bad things can happen. My point is that while the disk controller may trigger the problem the instance that actually corrupts the database might be PostgreSQL itself. Of course, I'm aware that it's going to be quite hard to tell for sure who the culprit is. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: swapfile being eaten by unknown process
Kris Kennaway wrote: On Tue, Feb 15, 2005 at 04:11:31PM +, John wrote: Another data point - I see this in my nightly security logs: swap_pager: indefinite wait buffer: device: ad0s1f, blkno: 28190, size: 4096 maybe there's a bad block on the swap partition?? That's what this usually means, yes. Or the whole disk drive is about to die. That's the situation where I've seen this message most of the time. An indicator of this would be block numbers that appear to be at random. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vnode_pager_putpages errors and DOS? [5-STABLE, too]
Oliver Brandmueller wrote: On Fri, Nov 05, 2004 at 10:08:18PM +0100, Uwe Doering wrote: I've attached an updated version of the patch for 'vnode_pager.c'. On my test system it resolved the issue. Please let us know whether it works for you as well. Is there any known way to trigger the problem? I did not yet run into any trouble, but actually I don't want to :-) After a brief search I found my test program again. Please find it attached to this email. As far as I can tell it should still work. Nothing complicated, in fact. The program just creates and mmaps a file that is slightly larger than the available disk space, and then modifies all pages within the mmapped area so that the syncer has to flush them out to disk. In case you would like to try it, please adjust FILENAME and FILELEN. FILELEN is supposed to be slightly larger than the available disk space. I ran my tests on a 100 MB file system mounted under '/', since that was the smalles FS available on that computer. After 30 seconds at max an unpatched system ought to become unresponsive due to the indefinite loop I described earlier. The patch applies cleanly to 5-STABLE as of today and as far as I could see there were no changes to the code which obsolete this patch. With the attached test program you'll probably find out soon whether PR and fix apply to RELENG_5 (and possibly above) as well. Please keep us posted. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net #include stdio.h #include fcntl.h #include unistd.h #include err.h #include sys/types.h #include sys/mman.h #define FILENAME/mnt/test /* where to put the test file */ #define FILELEN 110 /* test file length in MB */ main() { int fd; size_t len; char *buf, *p, *lim; len = FILELEN * 1024 * 1024; if ((fd = open(FILENAME, O_RDWR|O_CREAT|O_TRUNC, 0666)) == -1) err(2, open() failed); if (ftruncate(fd, len) == -1) err(2, ftruncate() failed); buf = mmap(NULL, len, PROT_WRITE, MAP_SHARED, fd, 0); if (buf == MAP_FAILED) err(2, mmap() failed); (void)close(fd); for (p = buf, lim = p + len; p lim; p += 4096) *p = '0'; if (munmap(buf, len) == -1) err(2, munmap() failed); exit(0); } ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vnode_pager_putpages errors and DOS?
Igor Sysoev wrote: On Thu, 11 Nov 2004, Igor Sysoev wrote: I've attached an updated version of the patch for 'vnode_pager.c'. On my test system it resolved the issue. Please let us know whether it works for you as well. Sorry for the late response: I was ill and have no access to the test machine. I applied the patch to the clean 4.10. The result is the same: the process could not be killed, the file system access is very limited and the system became unresponsible. Sorry, I applied the patch, but forget to rebuild kernel :). It seems that patch resolves the problem - the program exits and the system is working. I run it several times. I would also run buildworld on this system to ensure that the program did not affect VM. make -j 32 buildworld ran without problems. Good to hear that it works for you, too. So, one long-standing bug less. However, the real challenge could turn out to be finding a committer for RELENG_4, now that everybody is working on RELENG_5 and above. I have a number of PRs in GNATS already that are slowly rotting away since apparently nobody is willing to deal with them, even though they all have patches attached. So I stopped submitting PRs for now. Wasted time, unfortunately. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vnode_pager_putpages errors and DOS?
Igor Sysoev wrote: [...] I've tried your patch from second email (it requires to include sys/conf.h for devsw and D_DISK): the system also became unresponsible. The main problem is that I could not kill the offending process - it stuck in biowr state. In the meantime I've investigated this further. The two patches I provided so far certainly have their merits, since they deal with some unwanted side effects. However, I found that the root cause for the eventual system lock-up lies elsewhere. In an earlier email I already pointed out that function vnode_pager_generic_putpages() actually doesn't care whether the write operation failed or not. It always returns VM_PAGER_OK. Now, in case the write operation succeeds the file system code takes care that the formerly dirty pages associated with the i/o buffer get marked clean. On the other hand, if the write attempt fails, for instance in an out-of-disk-space situation, the pages are left dirty. At this point the syncer enters an infinite loop, trying to flush the same dirty pages to disk over and over again. The fix is actually quite simple. In case of a write error we have to make sure ourselves that the associated pages get marked clean. We do this by returning VM_PAGER_BAD instead of VM_PAGER_OK. These two result codes are functionally identical, with the exception that VM_PAGER_BAD additionally marks the respective page clean. For the details, please have a look at the caller function vm_pageout_flush() in 'vm_pageout.c'. What this modification means is that in case of a write error the affected pages remain intact in memory until they get recycled, but we lose their contents as far as the copy on disk is concerned. I believe this is acceptable (and possibly even originally intended) because giving up on syncing is about the best thing we can do in this situation, anyway. And it is certainly a much better choice than halting the whole system due to an infinite loop. I've attached an updated version of the patch for 'vnode_pager.c'. On my test system it resolved the issue. Please let us know whether it works for you as well. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net --- src/sys/vm/vnode_pager.c.orig Tue Dec 31 10:34:51 2002 +++ src/sys/vm/vnode_pager.cFri Nov 5 20:41:15 2004 @@ -954,7 +954,9 @@ struct uio auio; struct iovec aiov; int error; + int status; int ioflags; + static int last_elog, last_rlog; object = vp-v_object; count = bytecount / PAGE_SIZE; @@ -1035,15 +1037,18 @@ cnt.v_vnodeout++; cnt.v_vnodepgsout += ncount; - if (error) { + if (error last_elog != time_second) { + last_elog = time_second; printf(vnode_pager_putpages: I/O error %d\n, error); } - if (auio.uio_resid) { + if (auio.uio_resid last_rlog != time_second) { + last_rlog = time_second; printf(vnode_pager_putpages: residual I/O %d at %lu\n, auio.uio_resid, (u_long)m[0]-pindex); } + status = error ? VM_PAGER_BAD : VM_PAGER_OK; for (i = 0; i ncount; i++) { - rtvals[i] = VM_PAGER_OK; + rtvals[i] = status; } return rtvals[0]; } ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vnode_pager_putpages errors and DOS?
Igor Sysoev wrote: On Sat, 9 Oct 2004, Uwe Doering wrote: [...] I wonder whether the unresponsiveness is actually just the result of the kernel spending most of the time in printf(), generating warning messages. vnode_pager_generic_putpages() doesn't return any error in case of a write failure, so the caller (syncer in this case) isn't aware that the paging out failed, that is, it is supposed to carry on as if nothing happened. So how about limiting the number of warnings to one per second? UFS has similar code in order to curb file system full and the like. Please consider trying the attached patch, which applies cleanly to 4-STABLE. It won't make the actual application causing these errors any happier, but it may eliminate the DoS aspect of the issue. I have just tried your patch. To test I ran the program from http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/67919 The patch allows me to login on machine while the system reports about vnode_pager_putpages: I/O error 28. However, the file system access is very limited and after some time the system became unresponsible. Limited file system access is to be expected, since vnode_pager_putpages() keeps the number of dirty buffers ('numdirtybuffers') near its upper limit ('hidirtybuffers'). However, the unresponsiveness may be caused by another shortcoming I found in the meantime. When 'numdirtybuffers' is greater or equal 'hidirtybuffers', function bwillwrite() will block until 'numdirtybuffers' drops below some threshold value. bwillwrite() gets called in a number of places that deal with writing data to disk. Two of these places are dofilewrite() (which is in turn called by write() and pwrite()) and writev(). There, bwillwrite() gets called if the file descriptor is of type DTYPE_VNODE. Now, this unfortunately doesn't take into account that ttys, including pseudo ttys, and even /dev/null and friends, are character device nodes and therefore vnodes as well, but have nothing to do with writing data to disk. That is, in case of heavy disk write activity, write attempts to these device nodes get blocked, too! With the consequence that the system appears to become unresponsive at the shell prompt, or reacts very sporadic. Even daemonized processes that happen to log data to /dev/null (on stdout stderr, for example) will block. What we need here is an additional test that makes sure that in case of a character device bwillwrite() gets called only if the device is in fact a disk. Please consider trying out the attached patch. It will not reduce the heavy disk activity (which is, after all, legitimate), but it is supposed to enable you to operate the system at shell level and kill the offending process, or do whatever is necessary to resolve the problem. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net --- src/sys/kern/sys_generic.c.orig Tue Sep 14 19:56:53 2004 +++ src/sys/kern/sys_generic.c Sun Sep 26 13:13:46 2004 @@ -48,6 +48,7 @@ #include sys/filio.h #include sys/fcntl.h #include sys/file.h +#include sys/vnode.h #include sys/proc.h #include sys/signalvar.h #include sys/socketvar.h @@ -78,6 +79,23 @@ static int dofilewrite __P((struct proc *, struct file *, int, const void *, size_t, off_t, int)); +static __inline int +isndchr(vp) + struct vnode *vp; +{ + struct cdevsw *dp; + + if (vp-v_type != VCHR) + return (0); + if (vp-v_rdev == NULL) + return (0); + if ((dp = devsw(vp-v_rdev)) == NULL) + return (0); + if (dp-d_flags D_DISK) + return (0); + return (1); +} + struct file* holdfp(fdp, fd, flag) struct filedesc* fdp; @@ -403,7 +420,7 @@ } #endif cnt = nbyte; - if (fp-f_type == DTYPE_VNODE) + if (fp-f_type == DTYPE_VNODE !isndchr((struct vnode *)(fp-f_data))) bwillwrite(); if ((error = fo_write(fp, auio, fp-f_cred, flags, p))) { if (auio.uio_resid != cnt (error == ERESTART || @@ -496,7 +513,7 @@ } #endif cnt = auio.uio_resid; - if (fp-f_type == DTYPE_VNODE) + if (fp-f_type == DTYPE_VNODE !isndchr((struct vnode *)(fp-f_data))) bwillwrite(); if ((error = fo_write(fp, auio, fp-f_cred, 0, p))) { if (auio.uio_resid != cnt (error == ERESTART || ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vnode_pager_putpages errors and DOS?
Uwe Doering wrote: [...] What we need here is an additional test that makes sure that in case of a character device bwillwrite() gets called only if the device is in fact a disk. Please consider trying out the attached patch. It will not reduce the heavy disk activity (which is, after all, legitimate), but it is supposed to enable you to operate the system at shell level and kill the offending process, or do whatever is necessary to resolve the problem. Slight correction: Please use the attached patch instead. The first version would cause the compiler to complain about a missing prototype for isndchr(). Sorry about the oversight. I extracted the patch by hand from a larger patch collection. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net --- src/sys/kern/sys_generic.c.orig Tue Sep 14 19:56:53 2004 +++ src/sys/kern/sys_generic.c Sun Sep 26 13:13:46 2004 @@ -48,6 +48,7 @@ #include sys/filio.h #include sys/fcntl.h #include sys/file.h +#include sys/vnode.h #include sys/proc.h #include sys/signalvar.h #include sys/socketvar.h @@ -78,6 +79,22 @@ static int dofilewrite __P((struct proc *, struct file *, int, const void *, size_t, off_t, int)); +static __inline int +isndchr(struct vnode *vp) +{ + struct cdevsw *dp; + + if (vp-v_type != VCHR) + return (0); + if (vp-v_rdev == NULL) + return (0); + if ((dp = devsw(vp-v_rdev)) == NULL) + return (0); + if (dp-d_flags D_DISK) + return (0); + return (1); +} + struct file* holdfp(fdp, fd, flag) struct filedesc* fdp; @@ -403,7 +419,7 @@ } #endif cnt = nbyte; - if (fp-f_type == DTYPE_VNODE) + if (fp-f_type == DTYPE_VNODE !isndchr((struct vnode *)(fp-f_data))) bwillwrite(); if ((error = fo_write(fp, auio, fp-f_cred, flags, p))) { if (auio.uio_resid != cnt (error == ERESTART || @@ -496,7 +512,7 @@ } #endif cnt = auio.uio_resid; - if (fp-f_type == DTYPE_VNODE) + if (fp-f_type == DTYPE_VNODE !isndchr((struct vnode *)(fp-f_data))) bwillwrite(); if ((error = fo_write(fp, auio, fp-f_cred, 0, p))) { if (auio.uio_resid != cnt (error == ERESTART || ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic caused by EVFILT_SIGNAL detaching in rfork()ed thread
Igor Sysoev wrote: Here is more correct patch to fix the panic in 4.x reported in http://freebsd.rambler.ru/bsdmail/freebsd-hackers_2004/msg02732.html - --- src/sys/kern/kern_event.c Sun Oct 10 12:17:55 2004 +++ src/sys/kern/kern_event.c Sun Oct 10 12:19:29 2004 @@ -794,7 +794,8 @@ while (kn != NULL) { kn0 = SLIST_NEXT(kn, kn_link); if (kq == kn-kn_kq) { - kn-kn_fop-f_detach(kn); + if (!(kn-kn_status KN_DETACHED)) + kn-kn_fop-f_detach(kn); /* XXX non-fd release of kn-kn_ptr */ knote_free(kn); *knp = kn0; - Your patch appears to be an excerpt from the fix to RELENG_5. May I suggest a different approach for RELENG_4? My reasoning is that the implementation of kevents differs between RELENG_4 and RELENG_5. In RELENG_5 the flag KN_DETACHED is used in a more general way. It gets set by knlist_add() and cleared by knlist_remove(), in sync with list insertion and removal. As far as I can tell these routines have originally been introduced in order to centralize the locking for kevent list manipulations, and they don't exist in RELENG_4. Now, the proper way to MFC the RELENG_5 fix to RELENG_4 more or less unchanged would be to MFC the whole knlist_add()/knlist_remove() business as well (w/o the locking stuff), which, however, would be overkill for RELENG_4's single threaded kernel. In RELENG_4's implementation of kevents, the only case in which KN_DETACHED gets set is when a process exits and posts a NOTE_EXIT event. That is, the meaning of KN_DETACHED is much narrower than in RELENG_5. For this reason I believe the most appropriate fix would be to check for KN_DETACHED in filt_sigdetach() in the same way it is already done in filt_procdetach(). In fact, if you compare the two routines it becomes pretty obvious that they should have been identical in the first place, and that the absence of said check from filt_sigdetach() is most likely just an oversight. Therefore, I suggest to adopt the attached patch and leave the rest of RELENG_4's kevent code alone. I checked the kernel sources and found that filt_procdetach() and filt_sigdetach() are in fact the only f_detach() routines that deal with a process' p_klist field, and therefore need this kind of safeguard. Also, it would probably be a good idea to fix RELENG_4 swiftly (and possibly release a security advisory) because this flaw is certainly a great DoS opportunity for maliciously minded shell users ... Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net --- src/sys/kern/kern_sig.c.origThu Feb 5 23:26:48 2004 +++ src/sys/kern/kern_sig.c Sat Oct 23 11:15:30 2004 @@ -1739,6 +1739,10 @@ { struct proc *p = kn-kn_ptr.p_proc; + if (kn-kn_status KN_DETACHED) + return; + + /* XXX locking? this might modify another process. */ SLIST_REMOVE(p-p_klist, kn, knote, kn_selnext); } ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic caused by EVFILT_SIGNAL detaching in rfork()ed thread
Uwe Doering wrote: Igor Sysoev wrote: Here is more correct patch to fix the panic in 4.x reported in http://freebsd.rambler.ru/bsdmail/freebsd-hackers_2004/msg02732.html - --- src/sys/kern/kern_event.c Sun Oct 10 12:17:55 2004 +++ src/sys/kern/kern_event.c Sun Oct 10 12:19:29 2004 @@ -794,7 +794,8 @@ while (kn != NULL) { kn0 = SLIST_NEXT(kn, kn_link); if (kq == kn-kn_kq) { - kn-kn_fop-f_detach(kn); + if (!(kn-kn_status KN_DETACHED)) + kn-kn_fop-f_detach(kn); /* XXX non-fd release of kn-kn_ptr */ knote_free(kn); *knp = kn0; - Your patch appears to be an excerpt from the fix to RELENG_5. May I suggest a different approach for RELENG_4? My reasoning is that the implementation of kevents differs between RELENG_4 and RELENG_5. In RELENG_5 the flag KN_DETACHED is used in a more general way. It gets set by knlist_add() and cleared by knlist_remove(), in sync with list insertion and removal. [...] Slight correction: The logic is of course the other way round. That is, the text is supposed to read: It gets cleared by knlist_add() and set by knlist_remove() ... Sorry about that. :-) Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vnode_pager_putpages errors and DOS?
Robert Watson wrote: On Fri, 8 Oct 2004, Steve Shorter wrote: I have some machines that run customers cgi stuff. These machines have started to hang and become unresponsive. At first I thought it was a hardware issue, but I discovered in a cyclades log the following stuff that got logged to the console which explains the cause of the system hangs/failures. vnode_pager_putpages: residual I/O 65536 at 347 vnode_pager_putpages: I/O error 28] vnode_pager_putpages: residual I/O 65536 at 285] Aha! also at the same time I get in syslog /kernel: pid 6 (syncer), uid 0 on /chroot/tmp: file system full Whats happening? Can a full filesystem bring the thing down? Ideas? Fixes? Ideally not, but many UNIX programs respond poorly to being out of memory and disk space (No space, wot?). Are you using a swap file, and if so, how did you create the swapfile? Are you using sparse files much? I wonder whether the unresponsiveness is actually just the result of the kernel spending most of the time in printf(), generating warning messages. vnode_pager_generic_putpages() doesn't return any error in case of a write failure, so the caller (syncer in this case) isn't aware that the paging out failed, that is, it is supposed to carry on as if nothing happened. So how about limiting the number of warnings to one per second? UFS has similar code in order to curb file system full and the like. Please consider trying the attached patch, which applies cleanly to 4-STABLE. It won't make the actual application causing these errors any happier, but it may eliminate the DoS aspect of the issue. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net --- src/sys/vm/vnode_pager.c.orig Fri Oct 31 11:39:38 2003 +++ src/sys/vm/vnode_pager.cSun Feb 15 02:38:21 2004 @@ -955,6 +955,7 @@ struct iovec aiov; int error; int ioflags; + static int last_elog, last_rlog; object = vp-v_object; count = bytecount / PAGE_SIZE; @@ -1035,10 +1036,12 @@ cnt.v_vnodeout++; cnt.v_vnodepgsout += ncount; - if (error) { + if (error last_elog != time_second) { + last_elog = time_second; printf(vnode_pager_putpages: I/O error %d\n, error); } - if (auio.uio_resid) { + if (auio.uio_resid last_rlog != time_second) { + last_rlog = time_second; printf(vnode_pager_putpages: residual I/O %d at %lu\n, auio.uio_resid, (u_long)m[0]-pindex); } ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Werid message in /var/log/messages
Matt Douhan wrote: Hey I googled for this but found no reference to it, what does this message mean, Dec 12 11:10:08 mandarin /kernel: got bad cookie vp 0xdbb08800 bp 0xcc87a6cc my machine is: 12:26am mdouhan @ [mandarin] ~ uname -a FreeBSD mandarin.internal.hasta.se 4.8-STABLE FreeBSD 4.8-STABLE #0: Sat Apr 5 17:07:20 GMT 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ MANDARIN i386 any hints would be very helpful and appreciated This is a notification that a directory changed on the NFS server with regard to what the kernel had in its cache. It then tries to remedy the situation by re-reading the directory, which it succeeded in if there are no further related messages in the log. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Jonathan Gilpin wrote: I've run memtest (memtest86.com) kindly provided by Don and it passed all the tests. I've installed installed a kernel module to test for memory errors and found that again no memory errors are found... So this means it's either a problem with the CPU's or a geniune bug in the kernel. (bugger!) No, that's unfortunately not what it means. If a memory test fails you can draw the conclusion that you have bad memory, but this doesn't work the other way round. If a memory test passes there is still a possibility that a memory chip is the culprit since memory test software cannot find all errors. Also, there is the chip set on the mainboard that coordinates bus access etc. for the two CPUs. Mainboard and chip set developers are known to make errors, too. In this case you would have to swap the entire mainboard, possible with one from a different manufacturer. I can tell you from my own experience that it is really hard to find reliable PC hardware these days, in light of ever shorter and faster product release cycles. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD Virtual Server
Eric L Howard wrote: At a certain time, now past [Jul.05.2003-04:31:19PM -0300], [EMAIL PROTECTED] spake thusly: I have been browsing for web hosting and I found some firms (one of them is http://www.hub.org) offering 'virtual server hosting using FreeBSD'. They say that virtual server is different from virtual host, for the first is a completely separated enviroment, like a standalone server. [...] See jail(8). Right, but jail(8) is just a start. For users with higher expectations (read: business customers) you would need a couple of extra features, like the ability to inject processes into already running jails, real (per jail) SysV shared memory support etc. I think FreeBSD 5.x has at least the process injection feature now, but the downside is that you probably don't want to use 5.x for production so far, at least not with paying customers. On the other hand, if it's just for personal use, jail(8) and FreeBSD 5.x should be okay. My two cents. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers [EMAIL PROTECTED] | http://www.escapebox.net ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]