Okay, UML repeats its output when the pipe it's writing into fills up
until it returns -EAGAIN.  I can reproduce this even on a fast machine
by repeatedly suspending the xterm that UML is running under with a dumb
little script ala:

while /bin/true
do
        kill -STOP $PID
        sleep 1
        kill -CONT $PID
        sleep 1
done

And then running UML ala:
./linux rootfstype=hostfs rootflags=/ init=/home/landley/extract.sh rw mem=64M

Where extract.sh just cd's to the appropriate directory and extracts the
linux kernel tarball, verbosely.

I instrumented arch/um/chan_kern.c to write data to a file (writing to stdout
or stderr is discarded if the output pipe is full) and got one of the hiccups
to tell me the following:

--- snip ---
[gen=10 1 1 1 2][chan->output=1 0xa0335800 46 
0xa08a8320]linux-2.6.11.2/drivers/
video/kyro/STG4000Reg.h[gen=108 1 46 46 2][chan->output=1 0xa08efa7f 1 
0xa08a832
0]^M[gen=13 1 1 -11 2][Woot-11!][chan->output=1 0xa3dac000 2 0xa08a8320]
[gen=13 1 2 -11 2][Woot-11!][chan->output=1 0xa3dac000 51 0xa08a8320]
linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.c[gen=13 1 51 -11 2][Woot-11!]
[c
han->output=1 0xa3dac000 52 0xa08a8320]
linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.cl[gen=13 1 52 -11 2]
[Woot-11!][
chan->output=1 0xa3dac000 100 0xa08a8320]
linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.clinux-2.6.11.2/drivers/video/ky
ro/STG4000Ramdac.c[gen=13 1 100 -11 2][Woot-11!][chan->output=1 0xa3dac000 101 
0
xa08a8320]
linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.clinux-2.6.11.2/drivers/video/ky
ro/STG4000Ramdac.ci[gen=13 1 101 -11 2][Woot-11!][chan->output=1 0xa3dac000 
148
0xa08a8320]
--- snip ---

The logging added to write_chan looks like:
dprintf(crapfile,"[chan->output=%d %p %d %p]",chan->fd,buf,len,chan->data);

The logging lines added to generic_write are:
write(crapfile,buf,n);
dprintf(crapfile,"[gen=%d %d %d %d %d]",buf[0], fd,n,woo,errno);

(where "woo" is the return value of os_write_file).

Here's a translation of what's happening above:
[gen=108 1 46 46 2]: generic_write gets 46 bytes and manages to write 46
bytes to stdout.  Life is good.

[chan->output=1 0xa08efa7f 1 0xa08a8320]: The next call to write_chan wants
to write one byte (ctrl-M, you can see the mnemonic there from the cut and
paste, I was using less to view the output).  This gets passed on to 
generic_write.

[gen=13 1 1 -11 2]: Failure: the return from os_write_file in generic_write 
went "-EAGAIN", because the xterm it's writing to is suspended and the pipe 
is full.  (Actually I'm blasting out debug output to stdout too, and that 
debug printf filled up the output pipe and got truncated, so I can see in the 
other window that the pipe is indeed full.  The "Woot!" bit right after is 
another debug line I put it when the if(EAGAIN) test in write_chan trips, 
it's just printing out ret again.)

[chan->output=1 0xa3dac000 2 0xa08a8320]: The next call to write_chan
prints two bytes, which get passed to generic_write...

[gen=13 1 2 -11 2]: Printed right before this is a newline, so we can deduce
that an ascii 10 was probably added to the ascii 13, and less smoothed it 
over.  Note that the original data is still being resent on the retry, with 
the new data appended, so there's internal buffering upstream of us.  And in 
any case, the output buffer is still full, so it fails again.

[chan->output=1 0xa3dac000 51 0xa08a8320]: write_chan is now trying to write
51 bytes.  The generic_write instrumentation prints the snippet out right
after this (still starting with a newline, plus some tar -v output):

linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.c

[gen=13 1 51 -11 2]: Yeah, you tried 51 bytes and it's still -EAGAIN.

[chan->output=1 0xa3dac000 52 0xa08a8320]: Now trying to write 52 bytes...

linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.cl

[gen=13 1 52 -11 2]: What the?  There's an "l" at the end of that.  Why is
there an 'l' at the end of it?  The one extra byte should be a _newline_, not
an l.  (output buffer still full, it still fails, but the data being queued to 
us just hiccuped slightly.)

[chan->output=1 0xa3dac000 100 0xa08a8320]: We're now up to 100 bytes of 
output constipation being retried...

linux-2.6.11.2/drivers/video/kyro/STG4000Ramdac.clinux-2.6.11.2/drivers/video/ky
ro/STG4000Ramdac.c

[gen=13 1 100 -11 2]: And stuttering!  We have achieved stuttering!  This
isn't actually going out yet (the output buffer is still full, 0 bytes are
getting written each time), but the data that's backing up in the buffer is
the _WRONG_DATA_.  Whoever is queueing this crap to us is getting it wrong.  
Copies of the _existing buffer contents_ are being appended to the buffer, 
_not_ the new data that's supposed to be appended.  Somebody somewhere, 
upstream of this, is using the wrong pointer on a retry.

It goes on like this, until the buffer finally unblocks and dumps 4k of
stutter to the output.  (Data after 4k seems to be discarded or something.
Eventually the 4k buffer we're appending to fills up and it goes:

[chan->output=1 0xa3dac000 4095 0xa08a8320]
[gen=13 1 4095 -11 2]
[chan->output=1 0xa3dac000 4095 0xa08a8320]
[gen=13 1 4095 4051 2]
[chan->output=1 0xa3dacfd3 44 0xa08a8320]
[gen=118 1 44 44 2]

At which point we're back to normal until the next stutter.  But notice the
first three lines of that: either it's looping endlessly on retry, or data got
discarded.

Now the bit I'm currently blocked on is: Who is calling write_chan?  I've
tried to put a call to dump_stack() in there, but it just goes endlessly
recursive since dump_stack goes through printk, which gets us back here.
(Even putting in a global variable and a counter to make the recursive
calls not trigger dump stack doesn't help, it just hangs on the locking...)

Also, I have no idea what to do about more than 4k of data accumulating.  Is 
it getting discarded, is the upstream retry logic getting it right, should 
something be blocking at that point...?  (Yeah, having a write to the console 
block in D state would be icky, but in this case I think it's the right thing 
to have happen inside UML when the host filesystem isn't consuming your data 
fast enough...)

Any clues?  I can try tracing from the other end (down through a write syscall 
down into the guts of this thing) this weekend, but it'd be really nice to 
know where to look...

Rob


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to