I have found an interesting problem with large reads. I have been trying to ascertain what the correct protocol is for read errors.
What ndd-server currently does is process the read in chunks of BUF_SIZE size. If any chunk errors, it sends an error response. This is problematic because the server cannot correctly process an error response if it is sent half-way through a stream of data blocks. It causes the connection to hang. As the error code may be interpreted as data, which might be acted upon, it is theoretically possible that this might cause corruption (though this is unlikely with the current client as the error response is so much smaller than a block). Reading the protocol, there is only one possible interpretation of what is meant to happen (as far as I can tell). Either the response is meant to error, in which case no data is sent at all, or the response does not error, in which case all the data is meant to be sent. There is (rightly) no "send half the data and an error" variant. But this is really problematic for the reason set out below. Let's suppose that a given server can handle large reads efficiently. What I want to do is to start sending the data to the tcp channel before I've read all the data. This is in fact what nbd-server attempts to do right now in the read is bigger than BUF_SIZE. The problem occurs if a read other than the first errors (or more accurately if any read errors after we have sent any data). How do we represent that error to the server? We've already returned that the operation has succeeded. To do proper error handling (which nbd-server doesn't, as far as I can tell), we'd need to save the whole read in memory, which is (a) memory inefficient, and (b) throughput inefficient as we'd have to buffer the entire read. One answer to this is "don't use large reads, then". However, in certain situations (e.g. servers than can parallelize requests), it's far more efficient to do larger reads. Even now, we wait until a large amount of data has been read before sending any. Given that errors are really unlikely in the great scheme of things, a relatively low overhead solution to this would be to send the read followed by the error code (again) (we could signal this by returning "EDONTKNOW" or something in the original error field). If this was non-zero, the client would discard all the data and use this as the error code. This would waste 4 octets on every read reply where EDONTKNOW was used, which would solely be large read requests. Obviously as EDONTKNOW would be sent at the end of a large read, if there is an early error, we'd have to send a large amount of junk over TCP in the event of an error, but this is hardly a problem. Whilst in theory we'd need to signal EDONTKNOW support, actually large reads ( > BUFSIZ ) are pretty dodgy in that any error will cause a disconnect. Paul suggests we never get them anyway due to kernel request size limitations, though Wouter seems skeptical. So I am tempted just to put EDONTKNOW support into nbd-server and the kernel without any signalling. There cannot be many people using large reads reliably as prior to the last release there were full of all sorts of, um, interesting features. -- Alex Bligh ------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Discover what all the cheering's about. Get your free trial download today. http://p.sf.net/sfu/quest-dev2dev2 _______________________________________________ Nbd-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nbd-general
