On Saturday 18 October 2008 16:06:22 Ralf Friedl wrote: > Denys Vlasenko wrote: > > I think scheduler is at play here. > > Imagine a low-end system (say ~100MHz cpu) with rather poor capabilities > > wrt measuring time. No microsecond clock, just timer interrupts.
I don't seem to have gotten the original of this message, but I repeat: timer ticks have nothing to do with it. This is a race between two entirely in-kernel events, neither of which involves the scheduler until the event is already _over_. > > If one runs Linux on such a system with HZ=100, poll() with timeout of > > 10ms is basically a "one timer tick timeout". I don't know how precise it > > can be. Sleeps in the linux kernel can never return early unless interrupted, but they can return arbitrarily late if the system is busy doing something else. (This horrifies the realtime people, but that's a can of worms.) When your sleep expires, your process gets taken off the wait queue it's on and put back in the scheduler queue. Your process _cannot_ be scheduled before this happens, but when the scheduler actually has a spare time slice to hand it is a separate issue. (And that second decision depends on process priority and which scheduler variant you're using and whether or not you've invoked the funky realtime options and...) This was the original O(1) stuff, the way they _got_ O(1) was by not looking at inactive processes at all when making a scheduler decision but instead moving processes between different queues depending on its current status, so if your process is waiting it's on a wait queue and not on a scheduler queue. (It gets moved to a separate linked list, and then moved back when its timer expires.) I believe current kernels use CFS (the "Completely Fair Scheduler"), which was Ingo Molnar following up on the work Con Kolivas did to improve Ingo's original O(1) scheduler, which replaced Linus's original scheduler back during 2.4. If you _really_ care about this, go to http://lwn.net/Kernel/Index and scroll down to the "Scheduler" topic. The dozen or so articles in there are a good primer on the issue, although they don't cover things like this ACM article on CFS: http://portal.acm.org/citation.cfm?id=1400097.1400102 > > To be exact, can kernel ensure that it won't be significantly > > _smaller_ than 10ms? Yes. It can. That's the whole POINT of sleeps. Now what _can_ return early from a sleep is being interrupted by a signal, in which case the system call returns -EINTR, as for all system calls, which is why sigaction(SA_RESTART) or a wrapper is a good idea. Busybox has a safe_poll() wrapper (which may itself be buggy, but that's a separate issue). That said, just about the only signal that's likely to interrupt a sleep in a way that returns EINTR rather than killing the process is SIGSTOP. I suppose you could write a program to SIGSTOP and SIGCONT vi in a tight loop to try to beat bad behavior out of it, with the current safe_poll() wrapper doing a strange decrement you'd have to send 20 signals in under 10ms in order to beat bad behavior out of it. You'd have to write it in C though, even on my Core 2 duo laptop a shell script probably couldn't do it fast enough. (Yes, I can come up with a pathological case to break most things, but this is really not an interesting case.) > > What if poll() was called soon before our 10ms > > timeslice was going to expire anyway? Why do you believe time slices are 10ms? Where did you get that from? Even the O(1) scheduler in 2.4 did timeslice aggregation, not sure about Linus's original scheduler from 1992... > > When the timeout will expire? Next > > timeslice? Then it can be smaller than 10ms. Two timeslices from now? > > This means almost always poll(10) will do ~20ms waits. > > Imagine a high-end system with HT=1000 and poll with timeout of 1ms. Now > poll() happens to be called just 1µs before the end of the timeslice. If > the kernel wouldn't wait for *at least* the specified time, the behavior > would be undefined and such a timeout would be useless. Time slices DON'T WORK THAT WAY ANYMORE. They HAVEN'T FOR YEARS. Jiffies are a resource allocation thing, that's all. Scheduling decisions don't wait for time slices to end, it's a cheap O(1) action so it can be done on return from every system call and most interrupts. (I believe that was part of Robert Love's original low latency work predating even the O(1) scheduler, but it's been so many years I'd have to look it up. I'm fairly sure this predated the kernel preemption stuff...) Processes get scheduled as soon as there's a processor to run them on that isn't busy with something else. When they wait they get a priority boost (notice how "priority" and "niceness" are different fields in top), so they only wait until the end of a time slice for other processes of the same priority. Having another process with the exact same priority as vi is somewhat potluck due to there being _lots_ priority levels the dynamic priority adjustment goes through, but also means that either the other process is also massively I/O bound and woke up at the exact same instant we did (lottery situation) so it's almost never actually running right now, or else it means that VI has become CPU bound and interactivity ain't an issue no more anyway. I _also_ note that modern Linux system calls such as ppoll() have a _nanoseconds_ field because miliseconds wasn't considered accurate enough. I also point out that arm is the platform on which Thomas Gleixner first implemented the high resolution timers infrastructure in the first place, so modern embedded hardware is surprisingly well represented here. I also point out that the scheduler still isn't involved the poll() system call's decision to return data or to return expiration. It's a separate kernel wait queue, so this entire discussion is off to one side of the real issue... Rob _______________________________________________ busybox mailing list [email protected] http://busybox.net/cgi-bin/mailman/listinfo/busybox
