On Thu, 2010-02-18, Neels J Hofmeyr wrote: > Great, moving forward fast on pristine design questions!
Hi Neels. Did you start working the new knowledge into a document? Lots of stuff was said in this thread and it would be useful to see where we are at. I have a couple of comments. THE PRISTINE-WRITE API I was thinking about the "write" API and how an API designed around a stream is surely better than one designed around "tell me the path where I can put a file". The objection to "give me a writable stream and I'll write to it" was that the stream close handler wouldn't know whether the stream was closed after a successful write or because of an error. We can rectify this by adding a second step: require the caller to call a special "commit" API to close the stream on success, and any other way of closing the stream would abandon the new content. // Caller passes the checksum if it knows it, or passes NULL and // the store in that case will calculate the checksum. stream = pristine_start_new_file(expected_checksum); [caller writes to stream] // Now the store commits the new text, verifies the checksum // (if it was given an expected one) and returns it. new_checksum = pristine_close_new_file(stream); Now let's examine the ways in which a caller might want to give new content to the store: 1. Caller asks for a writable stream and pushes content to that, then calls a "commit" function. 2. Caller has a readable stream and wants the store to pull from that. 3. Caller has a (non-temporary) file and wants the store to read from that file. 4. Caller has to create a temporary file for reasons beyond its control (output of an external tool perhaps) and wants the store to take the entire file by an atomic move if possible. This is the case where it would be more efficient if it know where to put the file in the first place. The caller can easily implement 2 and 3 in terms of an API that provides 1, so that just leaves 1 and 4 that are worthwhile as an API. I feel that (1) is by far the more important one to have, and (4) is a specialist optimisation. VERIFYING CHECKSUMS I didn't read everything you were discussing but I got worried by hearing about providing options for the caller to request checksums to be verified or not per call. That sounds like too much complexity. I'm sure we should start with a global compile-time verification enable switch, and if we really find we need more fine-grained control then we should consider how to provide it then. It might not need an API flag: for example we might decide it should automatically verify on the first read and once in every hundred reads, or all sorts of internal possibilities like that. > The one thing left now is: > > Can someone explain a motivation for even creating a database row before > > the pristine file is moved into place in the pristine store? I currently > > don't see why it can't be way simpler. [...] I would just write it down the way you think it should be in the main flow of your document, and mention outstanding questions like this in notes. "Simultaneous or multi-threaded clients" would be my first reaction to that particular question. - Julian