On 03/07/2013 23:42, Joseph Schaefer wrote: > Dechunked means it strips out the lines containing metadata about the next > block of raw data. The metadata is just the length of the next block of data. > Imagine a chunked stream is like having partial content length headers > embedded in the data stream. > > The http filter embedded in httpd takes care of the metadata so you don't > have to parse the stream yourself. $r->read will always provide only the raw > data in a blocking call, until the stream is finished in which case it should > return 0 or an error code. Check the mod perl docs, or better the source, to > see if the semantics are more like perl's sysread or more like read. >
Yep. That makes sense to me too - it's just not what I read in your previous email, but maybe I read it wrong :) > Sent from my iPhone > > On Jul 3, 2013, at 4:31 PM, Jim Schueler <jschue...@eloquency.com> wrote: > >> In light of Joe Schaefer's response, I appear to be outgunned. So, if >> nothing else, can someone please clarify whether "de-chunked" means >> re-assembled? >> >> -Jim >> >> On Wed, 3 Jul 2013, Jim Schueler wrote: >> >>> Thanks for the prompt response, but this is your question, not mine. I >>> hardly need an RTFM for my trouble. >>> >>> I drew my conclusions using a packet sniffer. And as far-fetched as my >>> answer may seem, it's more plausible than your theory that Apache or >>> modperl is decoding a raw socket stream. >>> >>> The crux of your question seems to be how the request content gets >>> magically re-assembled. I don't think it was ever disassembled in the >>> first place. But if you don't like my answer, and you don't want to ignore >>> it either, then please restate the question. I can't find any definition >>> for unchunked, and Wiktionary's definition of de-chunk says to "break apart >>> a chunk", that is (counter-intuitively) chunk a chunk. >>> >>> >>>> Second, if there's no Content-Length header then how >>>> does one know how much >>>> data to read using $r->read? >>>> >>>> One answer is until $r->read returns zero bytes, of >>>> course. But, is >>>> that guaranteed to always be the case, even for, >>>> say, pipelined requests? >>>> My guess is yes because whatever is de-chunking the >>> >>> read() is blocking. So it never returns 0, even in a pipeline request (if >>> no data is available, it simply waits). I don't wish to discuss the merits >>> here, but there is no technical imperative for a content-length request in >>> the request header. >>> >>> -Jim >>> >>> >>> >>> >>> >>> >>> On Wed, 3 Jul 2013, Bill Moseley wrote: >>> >>>> Hi Jim, >>>> This is the Transfer-Encoding: chunked I was writing about: >>>> http://tools.ietf.org/html/rfc2616#section-3.6.1 >>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschue...@eloquency.com> >>>> wrote: >>>> I played around with chunking recently in the context of media >>>> streaming: The client is only requesting a "chunk" of data. >>>> "Chunking" is how media players perform a "seek". It was >>>> originally implemented for FTP transfers: E.g, to transfer a >>>> large file in (say 10K) chunks. In the case that you describe >>>> below, if no Content-Length is specified, that indicates "send >>>> the remainder". >>>> >>>>> From what I know, a "chunk" request header is used this way to >>>> specify the server response. It does not reflect anything about >>>> the data included in the body of the request. So first, I would >>>> ask if you're confused about this request information. >>>> >>>> Hypothetically, some browsers might try to upload large files in >>>> small chunks and the "chunk" header might reflect a push >>>> transfer. I don't know if "chunk" is ever used for this >>>> purpose. But it would require the following characteristics: >>>> >>>> 1. The browser would need to originally inquire if the server >>>> is >>>> capable of this type of request. >>>> 2. Each chunk of data will arrive in a separate and >>>> independent HTTP >>>> request. Not necessarily in the order they were sent. >>>> 3. Two or more requests may be handled by separate processes >>>> simultaneously that can't be written into a single >>>> destination. >>>> 4. Somehow the server needs to request a resend if a chunk is >>>> missing. >>>> Solving this problem requires an imaginitive use of HTTP. >>>> >>>> Sounds messy. But might be appropriate for 100M+ sized uploads. >>>> This *may* reflect your situation. Can you please confirm? >>>> >>>> For a single process, the incoming content-length is >>>> unnecessary. Buffered I/O automatically knows when transmission >>>> is complete. The read() argument is the buffer size, not the >>>> content length. Whether you spool the buffer to disk or simply >>>> enlarge the buffer should be determined by your hardware >>>> capabilities. This is standard IO behavior that has nothing to >>>> do with HTTP chunk. Without a "Content-Length" header, after >>>> looping your read() operation, determine the length of the >>>> aggregate data and pass that to Catalyst. >>>> >>>> But if you're confident that the complete request spans several >>>> smaller (chunked) HTTP requests, you'll need to address all the >>>> problems I've described above, plus the problem of re-assembling >>>> the whole thing for Catalyst. I don't know anything about >>>> Plack, maybe it can perform all this required magic. >>>> >>>> Otherwise, if the whole purpose of the Plack temporary file is >>>> to pass a file handle, you can pass a buffer as a file handle. >>>> Used to be IO::String, but now that functionality is built into >>>> the core. >>>> >>>> By your last paragraph, I'm really lost. Since you're already >>>> passing the request as a file handle, I'm guessing that Catalyst >>>> creates the tempororary file for the *response* body. Can you >>>> please clarify? Also, what do you mean by "de-chunking"? Is >>>> that the same think as re-assembling? >>>> >>>> Wish I could give a better answer. Let me know if this helps. >>>> >>>> -Jim >>>> >>>> On Tue, 2 Jul 2013, Bill Moseley wrote: >>>> >>>> For requests that are chunked (Transfer-Encoding: >>>> chunked and no >>>> Content-Length header) calling $r->read returns >>>> unchunked data from the >>>> socket. >>>> That's indeed handy. Is that mod_perl doing that >>>> un-chunking or is it >>>> Apache? >>>> >>>> But, it leads to some questions. >>>> >>>> First, if $r->read reads unchunked data then why is >>>> there a >>>> Transfer-Encoding header saying that the content is >>>> chunked? Shouldn't >>>> that header be removed? How does one know if the >>>> content is chunked or >>>> not, otherwise? >>>> >>>> Second, if there's no Content-Length header then how >>>> does one know how much >>>> data to read using $r->read? >>>> >>>> One answer is until $r->read returns zero bytes, of >>>> course. But, is >>>> that guaranteed to always be the case, even for, >>>> say, pipelined requests? >>>> My guess is yes because whatever is de-chunking the >>>> request knows to stop >>>> after reading the last chunk, trailer and empty >>>> line. Can anyone elaborate >>>> on how Apache/mod_perl is doing this? >>>> >>>> Perhaps I'm approaching this incorrectly, but this >>>> is all a bit untidy. >>>> >>>> I'm using Catalyst and Catalyst needs a >>>> Content-Length. So, I have a Plack >>>> Middleware component that creates a temporary file >>>> writing the buffer from >>>> $r->read( my $buffer, 64 * 1024 ) until that returns >>>> zero bytes. I pass >>>> this file handle onto Catalyst. >>>> >>>> Then, for some content-types, Catalyst (via >>>> HTTP::Body) writes the body to >>>> another temp file. I don't know how >>>> Apache/mod_perl does its de-chunking, >>>> but I can call $r->read with a huge buffer length >>>> and Apache returns that. >>>> So, maybe Apache is buffering to disk, too. >>>> >>>> In other words, for each tiny chunked JSON POST or >>>> PUT I'm creating two (or >>>> three?) temp files which doesn't seem ideal. >>>> >>>> -- >>>> Bill Moseley >>>> mose...@hank.org >>>> -- >>>> Bill Moseley >>>> mose...@hank.org