Quoting "Jeremy Chadwick" <free...@jdc.parodius.com>:
>On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>Quoting "Jeremy Chadwick" <free...@jdc.parodius.com>:
>>
>>>On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>Quoting "Jeremy Chadwick" <free...@jdc.parodius.com>:
>>>>
>>>>>On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>I'm running virtualbox 3.2.12_1 if that has anything to do with it.
>>>>>>
>>>>>>sysctl vfs.zfs.arc_max: 6200000000
>>>>>>
>>>>>>While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>hovering right around that value, sometimes above, sometimes
>>>>>>below (that's as it should be, right?). I don't think that it
>>>>>>dies when crossing over arc_max. I can run the same scp 10 times
>>>>>>and it might fail 1-3 times, with no correlation to the
>>>>>>arcstats.size being above/below arc_max that I can see.
>>>>>>
>>>>>>Scott
>>>>>>
>>>>>>On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>
>>>>>>>Hi all,
>>>>>>>
>>>>>>>just as an addition: an upgrade to last Friday's
>>>>>>>FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>problem.
>>>>>>>
>>>>>>>I will experiment a bit more tomorrow after hours and grab
>>>>some statistics.
>>>>>>>
>>>>>>>Regards
>>>>>>>Peter
>>>>>>>
>>>>>>>Quoting "Peter Ross" <peter.r...@bogen.in-berlin.de>:
>>>>>>>
>>>>>>>>Hi all,
>>>>>>>>
>>>>>>>>I noticed a similar problem last week. It is also very
>>>>>>>>similar to one reported last year:
>>>>>>>>
>>>>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html
>>>>>>>>
>>>>>>>>My server is a Dell T410 server with the same bge card (the
>>>>>>>>same pciconf -lvc output as described by Mahlon:
>>>>>>>>
>>>>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html
>>>>>>>>
>>>>>>>>Yours, Scott, is a em(4)..
>>>>>>>>
>>>>>>>>Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>just want to mention it, in case it matters. I am still
>>>>>>>>running VirtualBox 3.2.
>>>>>>>>
>>>>>>>>Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>then the value was still below.
>>>>>>>>
>>>>>>>>I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it
>>does not help.
>>>>>>>>
>>>>>>>>BTW: It looks as ARC only gives back the memory when I
>>>>>>>>destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>machines). Even if nothing happens for hours the buffer
>>>>>>>>isn't released..
>>>>>>>>
>>>>>>>>My machine was still running 8.2-PRERELEASE so I am upgrading.
>>>>>>>>
>>>>>>>>I am happy to give information gathered on old/new kernel
if it helps.
>>>>>>>>
>>>>>>>>Regards
>>>>>>>>Peter
>>>>>>>>
>>>>>>>>Quoting "Scott Sipe" <csco...@gmail.com>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>>>>>>>>
>>>>>>>>>>On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote:
>>>>>>>>>>>On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote:
>>>>>>>>>>>>I'm running 8.2-RELEASE and am having new problems
>>>>>>>>>>>>with scp. When scping
>>>>>>>>>>>>files to a ZFS directory on the FreeBSD server --
>>>>>>>>>>>>most notably large files
>>>>>>>>>>>>-- the transfer frequently dies after just a few
>>>>>>>>>>>>seconds. In my last test, I
>>>>>>>>>>>>tried to scp an 800mb file to the FreeBSD system and
>>>>>>>>>>>>the transfer died after
>>>>>>>>>>>>200mb. It completely copied the next 4 times I
>>>>>>>>>>>>tried, and then died again on
>>>>>>>>>>>>the next attempt.
>>>>>>>>>>>>
>>>>>>>>>>>>On the client side:
>>>>>>>>>>>>
>>>>>>>>>>>>"Connection to home closed by remote host.
>>>>>>>>>>>>lost connection"
>>>>>>>>>>>>
>>>>>>>>>>>>In /var/log/auth.log:
>>>>>>>>>>>>
>>>>>>>>>>>>Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write
>>>>>>>>>>>>failed: Cannot allocate
>>>>>>>>>>>>memory
>>>>>>>>>>>>
>>>>>>>>>>>>I've never seen this before and have used scp before
>>>>>>>>>>>>to transfer large files
>>>>>>>>>>>>without problems. This computer has been used in
>>>>>>>>>>>>production for months and
>>>>>>>>>>>>has a current uptime of 36 days. I have not been
>>>>>>>>>>>>able to notice any problems
>>>>>>>>>>>>copying files to the server via samba or netatalk, or
>>>>any problems in
>>>>>>>>>>>>apache.
>>>>>>>>>>>>
>>>>>>>>>>>>Uname:
>>>>>>>>>>>>
>>>>>>>>>>>>FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat
>>>>>>>>>>>>Feb 19 01:02:54 EST
>>>>>>>>>>>>2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64
>>>>>>>>>>>>
>>>>>>>>>>>>I've attached my dmesg and output of vmstat -z.
>>>>>>>>>>>>
>>>>>>>>>>>>I have not restarted the sshd daemon or rebooted the computer.
>>>>>>>>>>>>
>>>>>>>>>>>>Am glad to provide any other information or test anything else.
>>>>>>>>>>>>
>>>>>>>>>>>>{snip vmstat -z and dmesg}
>>>>>>>>>>>
>>>>>>>>>>>You didn't provide details about your networking setup (rc.conf,
>>>>>>>>>>>ifconfig -a, etc.). netstat -m would be useful too.
>>>>>>>>>>>
>>>>>>>>>>>Next, please see this thread circa September 2010,
titled "Network
>>>>>>>>>>>memory allocation failures":
>>>>>>>>>>>
>>>>>>>>>>>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708
>>>>>>>>>>>
>>>>>>>>>>>The user in that thread is using rsync, which relies on
>>>>scp by default.
>>>>>>>>>>>I believe this problem is similar, if not identical, to yours.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Please also provide your output of ( /usr/bin/limits -a )
>>>>for the server
>>>>>>>>>>end and the client.
>>>>>>>>>>
>>>>>>>>>>I am not quite sure I agree with the need for ifconfig -a but some
>>>>>>>>>>information about the networking driver your using for
the interface
>>>>>>>>>>would be helpful, uptime of the boxes. And configuration
>>of the pool.
>>>>>>>>>>e.g. ( zpool status -a ;zfs get all <poolname> ) You
should probably
>>>>>>>>>>prop this information up somewhere so you can reference by
>>>>URL whenever
>>>>>>>>>>needed.
>>>>>>>>>>
>>>>>>>>>>rsync(1) does not rely on scp(1) whatsoever but rsync(1)
>>>>can be made to
>>>>>>>>>>use ssh(1) instead of rsh(1) and I believe that is what Jeremy is
>>>>>>>>>>stating here but correct me if I am wrong. It does use ssh(1) by
>>>>>>>>>>default.
>>>>>>>>>>
>>>>>>>>>>Its a possiblity as well that if using tmpfs(5) or
mdmfs(8) for /tmp
>>>>>>>>>>type filesystems that rsync(1) may be just filling up your
>>>>temp ram area
>>>>>>>>>>and causing the connection abort which would be
>>>>>>>>>>expected. ( df -h ) would
>>>>>>>>>>help here.
>>>>>>>>>
>>>>>>>>>Hello,
>>>>>>>>>
>>>>>>>>>I'm not using tmpfs/mdmfs at all. The clients yesterday
>>>>>>>>>were 3 different OSX computers (over gigabit). The FreeBSD
>>>>>>>>>server has 12gb of ram and no bce adapter. For what it's
>>>>>>>>>worth, the server is backed up remotely every night with
>>>>>>>>>rsync (remote FreeBSD uses rsync to pull) to an offsite
>>>>>>>>>(slow cable connection) FreeBSD computer, and I have not
>>>>>>>>>seen any errors in the nightly rsync.
>>>>>>>>>
>>>>>>>>>Sorry for the omission of networking info, here's the
>>>>>>>>>output of the requested commands and some that popped up
>>>>>>>>>in the other thread:
>>>>>>>>>
>>>>>>>>>http://www.cap-press.com/misc/
>>>>>>>>>
>>>>>>>>>In rc.conf: ifconfig_em1="inet 10.1.1.1 netmask 255.255.0.0"
>>>>>>>>>
>>>>>>>>>Scott
>>>>>
>>>>>Just to make it crystal clear to everyone:
>>>>>
>>>>>There is no correlation between this problem and use of ZFS.
People are
>>>>>attempting to correlate "cannot allocate memory" messages with
"anything
>>>>>on the system that uses memory". The VM is much more complex
than that.
>>>>>
>>>>>Given the nature of this problem, it's much more likely the issue is
>>>>>"somewhere" within a networking layer within FreeBSD, whether it be
>>>>>driver-level or some sort of intermediary layer.
>>>>>
>>>>>Two people who have this issue in this thread are both using
VirtualBox.
>>>>>Can one, or both, of you remove VirtualBox from the configuration
>>>>>entirely (kernel, etc. -- not sure what is required) and then
see if the
>>>>>issue goes away?
>>>>
>>>>On the machine in question I only can do it after hours so I will do
>>>>it tonight.
>>>>
>>>>I was _successfully_ sending the file over the loopback interface using
>>>>
>>>>cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > /dev/null"
>>>>
>>>>I did it, btw, with the IPv6 localhost address first (accidently),
>>>>and then using IPv4. Both worked.
>>>>
>>>>It always fails if I am sending it through the bce(4) interface,
>>>>even if my target is the VirtualBox bridged to the bce card (so it
>>>>does not "leave" the computer physically).
>>>>
>>>>Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and
>>>>kldstat output.
>>>>
>>>>I have another box where I do not see that problem. It copies files
>>>>happily over the net using ssh.
>>>>
>>>>It is an an older HP ML 150 with 3GB RAM only but with a bge(4)
>>>>driver instead. It runs the same last week's RELENG_8. I installed
>>>>VirtualBox and enabled vboxnet (so it loads the kernel modules). But
>>>>I do not run VirtualBox on it (because it hasn't enough RAM).
>>>>
>>>>Regards
>>>>Peter
>>>>
>>>>DellT410one# uname -a
>>>>FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Jun
>>>>30 17:07:18 EST 2011
>>>>r...@dellt410one.vv.fda:/usr/obj/usr/src/sys/GENERIC amd64
>>>>DellT410one# ifconfig -a
>>>>bce0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
>>>>metric 0 mtu 1500
>>>>
options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>>>> ether 84:2b:2b:68:64:e4
>>>> inet 192.168.50.220 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.221 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.223 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.224 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.225 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.226 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.227 netmask 0xffffff00 broadcast 192.168.50.255
>>>> inet 192.168.50.219 netmask 0xffffff00 broadcast 192.168.50.255
>>>> media: Ethernet autoselect (1000baseT <full-duplex>)
>>>> status: active
>>>>bce1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>>
options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>>>> ether 84:2b:2b:68:64:e5
>>>> media: Ethernet autoselect
>>>>lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>>> options=3<RXCSUM,TXCSUM>
>>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
>>>> inet6 ::1 prefixlen 128
>>>> inet 127.0.0.1 netmask 0xff000000
>>>> nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
>>>>vboxnet0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>> ether 0a:00:27:00:00:00
>>>>DellT410one# netstat -rn
>>>>Routing tables
>>>>
>>>>Internet:
>>>>Destination Gateway Flags Refs Use
Netif Expire
>>>>default 192.168.50.201 UGS 0 52195 bce0
>>>>127.0.0.1 link#11 UH 0 6 lo0
>>>>192.168.50.0/24 link#1 U 0 1118212 bce0
>>>>192.168.50.219 link#1 UHS 0 9670 lo0
>>>>192.168.50.220 link#1 UHS 0 8347 lo0
>>>>192.168.50.221 link#1 UHS 0 103024 lo0
>>>>192.168.50.223 link#1 UHS 0 43614 lo0
>>>>192.168.50.224 link#1 UHS 0 8358 lo0
>>>>192.168.50.225 link#1 UHS 0 8438 lo0
>>>>192.168.50.226 link#1 UHS 0 8338 lo0
>>>>192.168.50.227 link#1 UHS 0 8333 lo0
>>>>192.168.165.0/24 192.168.50.200 UGS 0 3311 bce0
>>>>192.168.166.0/24 192.168.50.200 UGS 0 699 bce0
>>>>192.168.167.0/24 192.168.50.200 UGS 0 3012 bce0
>>>>192.168.168.0/24 192.168.50.200 UGS 0 552 bce0
>>>>
>>>>Internet6:
>>>>Destination Gateway
>>>>Flags Netif Expire
>>>>::1 ::1 UH
>>>>lo0
>>>>fe80::%lo0/64 link#11 U
>>>>lo0
>>>>fe80::1%lo0 link#11 UHS
>>>>lo0
>>>>ff01::%lo0/32 fe80::1%lo0 U
>>>>lo0
>>>>ff02::%lo0/32 fe80::1%lo0 U
>>>>lo0
>>>>DellT410one# kldstat
>>>>Id Refs Address Size Name
>>>> 1 19 0xffffffff80100000 dbf5d0 kernel
>>>> 2 3 0xffffffff80ec0000 4c358 vboxdrv.ko
>>>> 3 1 0xffffffff81012000 131998 zfs.ko
>>>> 4 1 0xffffffff81144000 1ff1 opensolaris.ko
>>>> 5 2 0xffffffff81146000 2940 vboxnetflt.ko
>>>> 6 2 0xffffffff81149000 8e38 netgraph.ko
>>>> 7 1 0xffffffff81152000 153c ng_ether.ko
>>>> 8 1 0xffffffff81154000 e70 vboxnetadp.ko
>>>>DellT410one# pciconf -lv
>>>>..
>>>>bce0@pci0:1:0:0: class=0x020000 card=0x028d1028
>>>>chip=0x163b14e4 rev=0x20 hdr=0x00
>>>> vendor = 'Broadcom Corporation'
>>>> class = network
>>>> subclass = ethernet
>>>>bce1@pci0:1:0:1: class=0x020000 card=0x028d1028
>>>>chip=0x163b14e4 rev=0x20 hdr=0x00
>>>> vendor = 'Broadcom Corporation'
>>>> class = network
>>>> subclass = ethernet
>>>
>>>Could you please provide "pciconf -lvcb" output instead, specific to the
>>>bce chips? Thanks.
>>
>>Her it is:
>>
>>bce0@pci0:1:0:0: class=0x020000 card=0x028d1028
>>chip=0x163b14e4 rev=0x20 hdr=0x00
>> vendor = 'Broadcom Corporation'
>> class = network
>> subclass = ethernet
>> bar [10] = type Memory, range 64, base 0xda000000, size
>>33554432, enabled
>> cap 01[48] = powerspec 3 supports D0 D3 current D0
>> cap 03[50] = VPD
>> cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message
>> cap 11[a0] = MSI-X supports 9 messages in map 0x10
>> cap 10[ac] = PCI-Express 2 endpoint max data 256(512) link x4(x4)
>>ecap 0003[100] = Serial 1 842b2bfffe6864e4
>>ecap 0001[110] = AER 1 0 fatal 0 non-fatal 1 corrected
>>ecap 0004[150] = unknown 1
>>ecap 0002[160] = VC 1 max VC0
>
>Thanks Peter.
>
>Adding Yong-Hyeon and David to the discussion, since they've both worked
>on the bce(4) driver in recent months (most of the changes made recently
>are only in HEAD), and also adding Jack Vogel of Intel who maintains
>em(4). Brief history for the devs:
>
>The issue is described "Network memory allocation failures" and was
>reported last year, but two users recently (Scott and Peter) have
>reported the issue again:
>
>http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708
>
>And was mentioned again by Scott here, which also contains some
>technical details:
>
>http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063172.html
>
>What's interesting is that Scott's issue is identical in form but he's
>using em(4), which isn't known to behave like this. Both individuals
>are using VirtualBox, though we're not sure at this point if that is the
>piece which is causing the anomaly.
>
>Relevant details of Scott's system (em-based):
>
>http://www.cap-press.com/misc/
>
>Relevant details of Peter's system (bce-based):
>
>http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063221.html
>http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063223.html
>
>I think the biggest complexity right now is figuring out how/why scp
>fails intermittently in this nature. The errno probably "trickles down"
>to userland from the kernel, but the condition regarding why it happens
>is unknown.
BTW: I also saw 2 of the errors coming from a BIND9 running in a
jail on that box.
DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/messages
Apr 13 05:17:41 bind named[23534]: internal_send:
192.168.50.145#65176: Cannot allocate memory
Jun 21 23:30:44 bind named[39864]: internal_send:
192.168.50.251#36155: Cannot allocate memory
Jun 24 15:28:00 bind named[39864]: internal_send:
192.168.50.251#28651: Cannot allocate memory
Jun 28 12:57:52 bind named[2462]: internal_send:
192.168.165.154#1201: Cannot allocate memory
My initial guess: it happens sooner or later somehow - whether it is
a lot of traffic in one go (ssh/scp copies of virtual disks) or a
lot of traffic over a longer period (a nameserver gets asked again
and again).