Hi everyone,

I recently upgraded my main amd64 server from 10.3-stable (r302011) to
11.0-stable (r308099).  It went smoothly except for one big issue:
certain applications (but not the system as a whole) respond very
sluggishly, and video playback of any kind is extremely choppy.

The system is under very light load, and I see no evidence of abnormal
interrupt latency or interrupt load.  More interestingly, if I place the
system under full load (~0.0% idle) the problem *disappears* and
playback/responsiveness are smooth and quick.

Running ktrace on some of the affected apps points me at the problem:
huge variance in the amount of time spent in the nanosleep system call.
A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry
to return of the syscall.  OTOH, anything CPU-bound or that waits on
condvars or I/O interrupts seems to work fine, so this doesn't seem to
be an issue with overall system latency.

I can repro this with a simple program that just does a 3ms usleep in a
tight loop (i.e. roughly the amount of time a video player would sleep
between frames @ 30fps).  At light load ktrace will show the huge
nanosleep variance; under heavy load every nanosleep will complete in
almost exactly 3ms.

FWIW, I don't see this on -current, although right now all my -current
images are VMs on different HW so that might not mean anything.  I'm not
aware of any recent timer- or scheduler- specific changes, so I'm
wondering if perhaps the recent IPI or taskqueue changes might be
somehow to blame.

I'm not especially familiar w/ the relevant parts of the kernel, so any
guidance on where I should focus my debugging efforts would be much
appreciated.

Thanks,
Jason

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to