Heh, ok, now I found the message. (Just because I'm paying attention to the busybox list again doesn't mean I've got my mail filters set up right yet. :)
> But scheduling in the sender can. The sender almost certainly is using something like a 16550a UART, which has a 16 byte output buffer. (In addition the kernel may insert a pipe buffer on the output side.) If your escape sequences are written to the char device as single blocks, the main reason for your kernel to block the sending process is because the output buffer filled up. (Other reasons to block shouldn't decouple the sequences, either it blocks before sending or you're in the kernel and the whole sequence gets queued atomically.) This means you have at least 16 chars worth of transmission time for the sending process to get scheduled back, and here slower rates _help_ you; at 1200 bps that's 100ms. > If sender is not a hardware > but a process (say, sitting on the other side of ssh) and it > takes input from, say, serial line, it may read only one char, ESC, > and send it to us over ssh. Then it will read more and send more to us. > Delay between ESC and the rest here is affected by how *that process* > is scheduled, and there scheduling delays can not be predicted. A) Your example seems fairly contrived; ssh by itself can't do that, and ssh going _into_ a serial line can't do it either (it gets the 16 byte buffer). You need _two_ levels of reblocking for the case you're talking about to even come up. B) The http://en.wikipedia.org/wiki/Nagle_algorithm is the default on Linux, and its _job_ is to re-coalesce exactly this sort of thing. :) *shrug* I originally had it as 300 ms and was pretty happy with that. I just want to make sure we're making decisions based on what the system is actually doing... > Your qemu case can be such example too. What is injecting "serial" > input? It's a host process. Can it be delayed? Yes. Ok, that's wrong, and I explicitly tested it. The host process is doing write(fd, blah, 3); It's sending data in 3 byte chunks, across a pipe, using blocking writes. This is an atomic operation, so if the host process is running it either hasn't sent anything yet, or it's done. Pipe's don't de-aggregate data, so suspending the host process won't break up an atomic transaction. The only way you get a short write to a pipe is to do a nonblocking write to a full pipe, and A) very few things do nonblocking writes to pipes, B) the buffer size is either 4k (before 2005) or the new pipe buffer stuff which is even longer: http://lwn.net/Articles/119682/ So you'll probably be scheduled _back_ before it's had time to drain. (P.S. I just checked that write(1, blah, 1000000) didn't do a short write. So one million bytes going through a pipe weren't de-aggregated on the sender side. Just FYI.) > If qemu is clever enough to have many threads, It isn't. Making qemu multithreaded is a big todo item on the qemu mailing list; it can't take advantage of SMP without it, but it requires huge changes to the guts of qemu which the developers say won't happen any time soon. (Right now qemu can emulate SMP for some platforms, but only a single process is emulating all those CPUs, with its own sort of internal scheduler.) > it can have one process > to run quest OS and another process to do IO. It uses the asynchronous I/O stuff instead. AIO, nonblocking. They don't use a separate thread for I/O, nor have they shown any interest in such an approach, exactly because they don't want to introduce arbitrary latencies which can confuse some device drivers. > There is no guarantee > second process won't experience delay between it pushed ESC > down the "serial line" and it pushed the rest. > > > > Kernel simply gives no such promises. > > > > It checks for pending data before it checks for the timeout. The kernel > > accepting data from the serial port and queuing it to the tty happens in > > interrupt context, and that interrupt handling takes priority over > > handling the timer interrupt that causes the poll to expire. (That was > > the test I did by repeatedly suspending qemu for more than the timeout > > period.) > > I don't think this is how it works. poll() just blocks on _both_ timeout > and data being available. When block is gone and poll() continues inside > kernel, it checks why it was able to continue. That's where it checks > "do we have data available" and returns the number of fd's with data. > > Interrupt priority has nothing to do with it. One interrupt handler gets called before the other. The serial interrupt handler will queue data for the character device before waking up processes on that device's wait queue. The timer interrupt will wake up the process on the device's wait queue without having first queued data. Thus the "why did I wake up" check the function called by the work queue does (what used to be "bottom halves") will either reliably see data, or it won't. That's why interrupt priority is important, as I tested by suspending the qemu process and reliably forcing overlapping assertion of both interrupts. > > It's not scheduling. It has nothing to do with scheduling. That's what > > I ran a test to prove.) The point is that poll() is a syscall, and the > > kernel flags it to return to userspace when it either gets data from the > > device or when the timer it requested expires, neither of which has > > anything to do with the process scheduler. > > I think scheduler is at play here. > Imagine a low-end system (say ~100MHz cpu) with rather poor capabilities > wrt measuring time. No microsecond clock, just timer interrupts. Ok, here's the bit I replied to last message... > > I changed the code to only pre-read one character at a time, and only > > when another character was needed to match a potential escape sequence. > > > > That means that if we matched the entire escape sequence, we read > > _exactly_ as much data as we needed to do so, and have thus consumed > > exactly the contents of the buffer. That's why I set n=0. Why is this > > wrong? > > This is ok. Think what happens if match failed. > Failure to match may leave up to 4 chars buffered in readbuffer[]. Agreed. > Then on the next call we try to interpret *them* > as a potential ESC sequence. Correct. > This works correctly > because there are no sequences like "ESC ESC x y". If there were, the code would have to handle them. I still don't see the problem. > If I want to not be sadistic towards poor kernel and retrieve more chars > at once, I can do read(fd,buf,5) from the start. It's the same, right? > > After all, if it's not ESC seq, we will buffer up to 4 chars, > which is exactly what we do after failed match anyway. No, it's not the same. The _reason_ it's not the same is if we never do readahead when we're not parsing a sequence, then the data left in the buffer that we _did_ readahead must be part of a known sequence. Since the all start with escape and none of them has escape as a later character, then data we read ahead in the buffer _can't_ be part of a new escape sequence, except that the very last character might be another escape. This means that after a failed readhead, we won't start processing another escape sequence until we've dealt with the other characters as individual characters. The last character might be an escape, but we won't care until it's the _only_ character in the buffer, in which case there's no difference in processing between reading it normally and inheriting it from a previous readahead. If you just arbitrarily read 5 characters, and the start of the buffer was a 3 character escape sequence, then setting the buffer length zero after a readahead _would_ discard two characters. But with the one character at a time version, it never does, because we never read ahead _more_ than the escape sequence we're currently looking at. > Well, it's wrong. Example: x ESC O A ESC O A - it's keys > x, cursor_up, cursor_up. read(fd,buf,5) will read x ESC O A ESC. > > We eat x. We come back later and eat "ESC O A", and here's the problem - > we lose next ESC because code just sets chars_to_parse to 0 > if successful ESC sequence match happened. The code I wrote wouldn't have read the next ESC yet. We never have to "unget" data except for data that was _part_ of a valid escape sequence, but not _all_ of one. And until we get to the very last character of that, it can't be the start of a new escape sequence, at which point the buffer is essentially empty again. > So, in short, the code is correct as-is. But trying to enlarge the first > read() in readit() past 4 bytes will not work, in a rather non-obvious way. Well, it seemed obvious to me. Not always a good metric, I know. :) > I felt this is worth mentioning in the comments, so I did so. I understood the problem, but didn't understand your comment. Lemme see if I can clarify it... Also, while I'm here, a totally aesthetic comment: > for (eindex = esccmds; eindex < esccmds + ARRAY_SIZE(esccmds); > eindex++) { The reason I had no space between "esccmds+ARRAY_SIZE" is it fits in 80 chars that way. If you're going to bloat it so it doesn't fit on one line, moving the curly bracket to the next line makes it less unreadable. I'll just remove the explanation of _why_ 50 ms specifically. Saying that the delay is to aggregate serial data is enough, and the truth is that at 1200 bps, we're being paranoid. Besides, we can't rule _out_ somebody using a 300 bps modem, now can we? :) Rob (The longest arguments are always over comments, aren't they?) _______________________________________________ busybox mailing list [email protected] http://busybox.net/cgi-bin/mailman/listinfo/busybox
