Greg Troxel <g...@lexort.com> writes:
> Christof Meerwald <cme...@cmeerw.org> writes:
>> I am not sure about a missed interrupt (or even anything kernel
>> related) - wouldn't that also affect pings and TCP connections?
> 
> If the disk interrrupts were missed, and network interrupts were ok, it
> could be consistent.  I'm just guessing.
> 
>> To me it would feel more like maybe syslogd blocking on something
>> (which might then also block anything trying to log something via
>> syslog)?
> 
> I wouldn't expect syslog clients to block.
> 
> You also haven't described the machine.  If you're doing anything other
> than setting YES in rc.conf to start things at boot, please explain.
> 
> Also, you could explain the history and frequency.  Did this arise
> recently?
To what kind of (virtualized) disk controller is the disk
connected?

I'm seeing somewhat similar problems with a moderately busy name
server running on a ProxMox-cluster.  From time to time, some
services stop responding, while others continue to run.  The only
(rough) correlation I've spotted, is that services I expect need
to write to the disk seem to get stuck (named for incoming zone
transfers, syslogd for log files).  Ping and initial ssh banner
continue to work.

As for the time keeping, some notion of forward progress is still
there.  At boot, I start the script below, and whenever the gpt
command is run _and_ the system is stuck, the system starts
responding again.  So, the sleep command seems to wake up after
the expected number of seconds.  And the gpt command typically
access the raw device, in case that matters.

-------------------- poke-disk.sh --------------------
#!/bin/sh -eu

while sleep 600; do
    gpt show sd0 > /dev/null
done
-------------------- poke-disk.sh --------------------

Some virtio/scsibus/sd0 configuration output snippets:

ppb0 at pci0 dev 5 function 0: Red Hat Qemu PCI-PCI (rev. 0x00)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
virtio0 at pci1 dev 1 function 0
virtio0: SCSI device (id 8, rev. 0x00)
vioscsi0 at virtio0: features: 0x10000000<INDIRECT_DESC>
virtio0: allocated 86016 byte for virtqueue 0 for control, size 256
virtio0: using 73728 byte (4608 entries) indirect descriptors
virtio0: allocated 86016 byte for virtqueue 1 for event, size 256
virtio0: using 73728 byte (4608 entries) indirect descriptors
virtio0: allocated 86016 byte for virtqueue 2 for request, size 256
virtio0: using 73728 byte (4608 entries) indirect descriptors
vioscsi0: cmd_per_lun 128 qsize 256 seg_max 254 max_target 255 max_lun 16383
virtio0: config interrupting at msix0 vec 0
virtio0: queues interrupting at msix0 vec 1
scsibus0 at vioscsi0: 256 targets, 16384 luns per target
virtio1 at pci0 dev 18 function 0
virtio1: network device (id 1, rev. 0x00)
vioif0 at virtio1: features: 
0x31870020<EVENT_IDX,INDIRECT_DESC,NOTIFY_ON_EMPTY,CTRL_MAC,CTRL_RX,CTRL_VQ,STATUS,MAC>
vioif0: Ethernet address xx:xx:xx:xx:xx:xx
virtio1: allocated 65536 byte for virtqueue 0 for rx0, size 1024
virtio1: using 32768 byte (2048 entries) indirect descriptors
virtio1: allocated 20480 byte for virtqueue 1 for tx0, size 256
virtio1: using 8192 byte (512 entries) indirect descriptors
virtio1: allocated 8192 byte for virtqueue 2 for control, size 64
virtio1: config interrupting at msix1 vec 0
virtio1: queues interrupting at msix1 vec 1
ppb1 at pci0 dev 30 function 0: Red Hat Qemu PCI-PCI (rev. 0x00)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
virtio2 at pci2 dev 29 function 0
virtio2: entropy device (id 4, rev. 0x00)
viornd0 at virtio2: features: 0x10000000<INDIRECT_DESC>
virtio2: allocated 8192 byte for virtqueue 0 for Entropy request, size 8
virtio2: interrupting at ioapic0 pin 11
[ ... ]
sd0 at scsibus0 target 0 lun 0: <QEMU, QEMU HARDDISK, 2.5+> disk fixed
sd0: 256 GB, 16383 cyl, 16 head, 2048 sec, 512 bytes/sect x 536870912 sectors
[ ... ]
sd0: GPT GUID: d4755fd3-59a4-405e-940f-9a356b733f1c
dk0 at sd0: "6c6ce405-bcda-4586-b0b9-8fbaea0b07b1", 520091648 blocks at 2048, 
type: ffs
dk1 at sd0: "723c4abf-3734-4008-8931-8806384e2291", 16777183 blocks at 
520093696, type: swap
sd0: async, 8-bit transfers, tagged queueing

I plan to change the disk controller to something that is not 
virtio-based, and see if that changes the system behavior.

                                        -jarle

Reply via email to