specifically your "speedup" patch, sorry for thread jumping.
On 4 December 2013 11:41, Robert Newson <[email protected]> wrote: > I will try to review this this week if I can. I've worked in the > attachment code area before. The idea behind the speedup makes sense > to me but given the potential for data loss / corruption it'll need > careful review before it can go in. One thing to note is that the > replicator will verify the md5 of attachments as they're transferred, > if that helps with your testing. > > B. > > > On 4 December 2013 10:06, Nick North <[email protected]> wrote: >> As mentioned the other day, I'm hoping to add CouchDb support for chunked >> HTTP requests that contain a document and attachments as a single >> multipart/related MIME request, and I'm hoping the group can advise me on >> the best coding direction. Apologies in advance for the length and detail >> of the email, but there doesn't seem to be a shorter way to ask the >> question with a sensible amount of background. >> >> Parsing multipart requests happens >> in couch_httpd:parse_multipart_request/3. This function scans the request >> for the MIME boundary string, reading 4KB blocks of data as needed. The >> pieces of data between boundary strings are passed to callback functions >> for further processing. The function to read the next block of data is an >> argument to parse_multipart_request called DataFun; it returns the data >> block plus the function to be used as the next DataFun. I think of this as >> a pull-based approach: data is pulled from the request as needed, with the >> pull returning some data and a new pull function. >> >> The natural extension to handle chunked requests would be to provide an >> improved DataFun that can grab the next 4KB block from either a chunked or >> an unchunked request. So I looked for existing support for chunked requests >> that could be reused. The chunked equivalent of the couch_httpd:recv/2 >> function that's used to pull 4KB blocks is the couch_httpd:recv_chunked/4 >> function. This calls the Mochiweb stream_body/3 function which, it >> transpires, was created for use in CouchDb. However, this differs in >> philosophy from the recv function: while recv just hands back a block of >> data, stream_body reads the whole of the request and calls a ChunkFun >> parameter on each block of data that it reads. I think of this as a >> push-based approach: the entire stream is read and pushed into a callback >> function, one block at a time. >> >> I can think of three ways to fix the mismatch between the pull and >> push-based approaches and provide chunked multipart support: >> >> 1. Rework parse_multipart_request to be push-based. This would allow >> reuse of stream_body, but at the cost of turning existing code inside out >> to fit with its push approach. >> 2. Create a pull-based version of stream_body and probably try to get in >> incorporated into Mochiweb. But having two similar versions of the same >> code like this doesn't feel right. >> 3. Convert stream_body from push-based to pull-based by spawning it in a >> new process that sends each block of data back to the >> parse_multipart_request DataFun and then blocks until the message is >> acknowledged. The DataFun receives the data when it needs to fetch the >> next >> block, and then sends an acknowledgement. >> >> The third option feels neatest and is my preferred route. But my ignorance >> of Erlang means that I don't know whether this is potentially expensive. >> While a new process is very cheap, it would mean that all the request data >> is copied from that process to parse_multipart_request, and I don't know if >> that is very costly. That sort of copying already goes on >> in couch_doc:doc_from_multi_part_stream where the parser is spawned off and >> copies each document and attachment back to the parent process but I don't >> know if that means the copying is cheap, or if it's an unavoidable evil >> that shouldn't be reproduced elsewhere. >> >> I'd really appreciate any advice that the group can give me on the best >> option to follow, and why, or suggestions for options that I've missed >> altogether. Thanks in advance for your help, >> >> Nick
