As mentioned the other day, I'm hoping to add CouchDb support for chunked HTTP requests that contain a document and attachments as a single multipart/related MIME request, and I'm hoping the group can advise me on the best coding direction. Apologies in advance for the length and detail of the email, but there doesn't seem to be a shorter way to ask the question with a sensible amount of background.
Parsing multipart requests happens in couch_httpd:parse_multipart_request/3. This function scans the request for the MIME boundary string, reading 4KB blocks of data as needed. The pieces of data between boundary strings are passed to callback functions for further processing. The function to read the next block of data is an argument to parse_multipart_request called DataFun; it returns the data block plus the function to be used as the next DataFun. I think of this as a pull-based approach: data is pulled from the request as needed, with the pull returning some data and a new pull function. The natural extension to handle chunked requests would be to provide an improved DataFun that can grab the next 4KB block from either a chunked or an unchunked request. So I looked for existing support for chunked requests that could be reused. The chunked equivalent of the couch_httpd:recv/2 function that's used to pull 4KB blocks is the couch_httpd:recv_chunked/4 function. This calls the Mochiweb stream_body/3 function which, it transpires, was created for use in CouchDb. However, this differs in philosophy from the recv function: while recv just hands back a block of data, stream_body reads the whole of the request and calls a ChunkFun parameter on each block of data that it reads. I think of this as a push-based approach: the entire stream is read and pushed into a callback function, one block at a time. I can think of three ways to fix the mismatch between the pull and push-based approaches and provide chunked multipart support: 1. Rework parse_multipart_request to be push-based. This would allow reuse of stream_body, but at the cost of turning existing code inside out to fit with its push approach. 2. Create a pull-based version of stream_body and probably try to get in incorporated into Mochiweb. But having two similar versions of the same code like this doesn't feel right. 3. Convert stream_body from push-based to pull-based by spawning it in a new process that sends each block of data back to the parse_multipart_request DataFun and then blocks until the message is acknowledged. The DataFun receives the data when it needs to fetch the next block, and then sends an acknowledgement. The third option feels neatest and is my preferred route. But my ignorance of Erlang means that I don't know whether this is potentially expensive. While a new process is very cheap, it would mean that all the request data is copied from that process to parse_multipart_request, and I don't know if that is very costly. That sort of copying already goes on in couch_doc:doc_from_multi_part_stream where the parser is spawned off and copies each document and attachment back to the parent process but I don't know if that means the copying is cheap, or if it's an unavoidable evil that shouldn't be reproduced elsewhere. I'd really appreciate any advice that the group can give me on the best option to follow, and why, or suggestions for options that I've missed altogether. Thanks in advance for your help, Nick
