On 2015-12-06 at 11:52 'Davide Libenzi' via Akaros wrote:
> IMHO the full caching in the upper layers might not be fitting for
> some devices, like you notice.
> Take a device with large size (ie a memory device), or a device
> (think PCI, or MSR) where upper layer caching does not provide the
> proper coherency characteristics.
> Would be nice that, devices which chose (because they know they fit
> their device specific model), would have a common way to do that.
> 
> What I am trying now in my interrupts device, is to create a text
> response onto a queue, stick it to chan->aux (qclose() it on channel
> close()), and use a new qpread() call upon offsetted reads coming
> from upper layers.

Seems reasonable, though as you said, it'd be nice for a common way to
do that.  What exactly does qpread() do?  Does it discard things it
jumps over?  It wasn't clear to me that you need anything other than
existing queue ops.

Is the model something like "if the queue is empty, generate a full
message and attach it to the queue. read from the queue until it is
done, then return 0 from the read"  What does an lseek do?  maybe for
SEEK_SET: flush the queue, generate a new message, seek to offset.


There are actually problems with just about every device like this.
For instance, mpstat's write needed commands like "reset".  At one
point, you'd get things like this:

/ $ echo reset > /prof/mpstat
Bad command, use "reset|on|off"

It happened intermittently (on a given boot, there was a chance you'd
get this behavior all the time.  otherwise never (this was somehow due
to glibc and busybox)). 

The issue was that echo was splitting the write too. e.g.
write(fd ,"r"), write(fd, "eset") (it was more complicated than that,
due to the layers of glibc and I/O buffering and whatnot). (Check out
c074a35e7f17 ("BB: manually writes echo's buffer") and 9a3ce3138e00
("Busybox echo buffers lines to stdout") for more info).

So even the simplest command could fail.  Likewise what seems like a
simple read (hey, it's just 10 bytes!) could be split.

Another related problem is if you have two commands, one of which is a
prefix of the other.  Say "reset" and "reset_all".  You try to write
"reset_all", but it gets split by some intermediate program to "reset"
and "_all".  Whoops!

This isn't quite the same issue as with read(), where new results are
created with each partial read and the results change between the
reads, but it's in the same ballpark.

One (bad) option would be for devices to reject reads and writes that
are too short.  This could work for some reads, but the problem is you
could have very large reads.  So that's ugly.  For writes, you don't
really know how long it needs to be.  Did a command lookup fail because
it was too short?  (it's a prefix check question).  That's a mess.  So
it seems like that won't work.  (It'd require user changes too, which
is a pain).

As far as how to deal with this goes, the queue (or something similar)
model seems okay.  It'll let the device generate a single response,
which can be read in pieces.  The thing that helps us is that the
device knows about the boundary and can do an EOF (return 0).  With a
few dev library helpers and some care about lseek and any other
gotchas, this can work.

We have less help with write().  Consider the "reset_all" and "reset"
commands.  One option is to require all commands to have a trailing
write(fd, "").  Yuck.  Though we can change our userland's echo to do
that.  But there are a bunch of programs that are written that don't
expect that behavior.  The kernel, for instance, sometimes is a client
of devices.  

One option is to just ignore the write() side.  That's pretty much
where I got two with those commits referenced above.  Other than echo,
all other sources of commands should be programmed to send their entire
command in one write.  echo now does up to 4096 byte writes.  Oh well.

Barret

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to