Re: mod_perl and Transfer-Encoding: chunked

Issac Goldstand Thu, 04 Jul 2013 01:42:04 -0700

On 03/07/2013 23:42, Joseph Schaefer wrote:
> Dechunked means it strips out the lines containing metadata about the next 
> block of raw data. The metadata is just the length of the next block of data. 
>  Imagine a chunked stream is like having partial content length headers 
> embedded in the data stream.
> 
> The http filter embedded in httpd takes care of the metadata so you don't 
> have to parse the stream yourself. $r->read will always provide only the raw 
> data in a blocking call, until the stream is finished in which case it should 
> return 0 or an error code.  Check the mod perl docs, or better the source, to 
> see if the semantics are more like perl's sysread or more like read.
>


Yep.  That makes sense to me too - it's just not what I read in your
previous email, but maybe I read it wrong :)

> Sent from my iPhone
> 
> On Jul 3, 2013, at 4:31 PM, Jim Schueler <jschue...@eloquency.com> wrote:
> 
>> In light of Joe Schaefer's response, I appear to be outgunned.  So, if 
>> nothing else, can someone please clarify whether "de-chunked" means 
>> re-assembled?
>>
>> -Jim
>>
>> On Wed, 3 Jul 2013, Jim Schueler wrote:
>>
>>> Thanks for the prompt response, but this is your question, not mine.  I 
>>> hardly need an RTFM for my trouble.
>>>
>>> I drew my conclusions using a packet sniffer.  And as far-fetched as my 
>>> answer may seem, it's more plausible than your theory that Apache or 
>>> modperl is decoding a raw socket stream.
>>>
>>> The crux of your question seems to be how the request content gets
>>> magically re-assembled.  I don't think it was ever disassembled in the 
>>> first place.  But if you don't like my answer, and you don't want to ignore 
>>> it either, then please restate the question.  I can't find any definition 
>>> for unchunked, and Wiktionary's definition of de-chunk says to "break apart 
>>> a chunk", that is (counter-intuitively) chunk a chunk.
>>>
>>>
>>>>           Second, if there's no Content-Length header then how
>>>>           does one know how much
>>>>           data to read using $r->read?   
>>>>
>>>>           One answer is until $r->read returns zero bytes, of
>>>>           course.  But, is
>>>>           that guaranteed to always be the case, even for,
>>>>           say, pipelined requests?  
>>>>           My guess is yes because whatever is de-chunking the
>>>
>>> read() is blocking.  So it never returns 0, even in a pipeline request (if 
>>> no data is available, it simply waits).  I don't wish to discuss the merits 
>>> here, but there is no technical imperative for a content-length request in 
>>> the request header.
>>>
>>> -Jim
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 3 Jul 2013, Bill Moseley wrote:
>>>
>>>> Hi Jim,
>>>> This is the Transfer-Encoding: chunked I was writing about:
>>>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschue...@eloquency.com>
>>>> wrote:
>>>>     I played around with chunking recently in the context of media
>>>>     streaming: The client is only requesting a "chunk" of data.
>>>>      "Chunking" is how media players perform a "seek".  It was
>>>>     originally implemented for FTP transfers:  E.g, to transfer a
>>>>     large file in (say 10K) chunks.  In the case that you describe
>>>>     below, if no Content-Length is specified, that indicates "send
>>>>     the remainder".
>>>>
>>>>> From what I know, a "chunk" request header is used this way to
>>>>     specify the server response.  It does not reflect anything about
>>>>     the data included in the body of the request.  So first, I would
>>>>     ask if you're confused about this request information.
>>>>
>>>>     Hypothetically, some browsers might try to upload large files in
>>>>     small chunks and the "chunk" header might reflect a push
>>>>     transfer.  I don't know if "chunk" is ever used for this
>>>>     purpose.  But it would require the following characteristics:
>>>>
>>>>       1.  The browser would need to originally inquire if the server
>>>>     is
>>>>           capable of this type of request.
>>>>       2.  Each chunk of data will arrive in a separate and
>>>>     independent HTTP
>>>>           request.  Not necessarily in the order they were sent.
>>>>       3.  Two or more requests may be handled by separate processes
>>>>           simultaneously that can't be written into a single
>>>>     destination.
>>>>       4.  Somehow the server needs to request a resend if a chunk is
>>>>     missing.
>>>>           Solving this problem requires an imaginitive use of HTTP.
>>>>
>>>>     Sounds messy.  But might be appropriate for 100M+ sized uploads.
>>>>      This *may* reflect your situation.  Can you please confirm?
>>>>
>>>>     For a single process, the incoming content-length is
>>>>     unnecessary. Buffered I/O automatically knows when transmission
>>>>     is complete.  The read() argument is the buffer size, not the
>>>>     content length.  Whether you spool the buffer to disk or simply
>>>>     enlarge the buffer should be determined by your hardware
>>>>     capabilities.  This is standard IO behavior that has nothing to
>>>>     do with HTTP chunk.  Without a "Content-Length" header, after
>>>>     looping your read() operation, determine the length of the
>>>>     aggregate data and pass that to Catalyst.
>>>>
>>>>     But if you're confident that the complete request spans several
>>>>     smaller (chunked) HTTP requests, you'll need to address all the
>>>>     problems I've described above, plus the problem of re-assembling
>>>>     the whole thing for Catalyst.  I don't know anything about
>>>>     Plack, maybe it can perform all this required magic.
>>>>
>>>>     Otherwise, if the whole purpose of the Plack temporary file is
>>>>     to pass a file handle, you can pass a buffer as a file handle.
>>>>      Used to be IO::String, but now that functionality is built into
>>>>     the core.
>>>>
>>>>     By your last paragraph, I'm really lost.  Since you're already
>>>>     passing the request as a file handle, I'm guessing that Catalyst
>>>>     creates the tempororary file for the *response* body.  Can you
>>>>     please clarify?  Also, what do you mean by "de-chunking"?  Is
>>>>      that the same think as re-assembling?
>>>>
>>>>     Wish I could give a better answer.  Let me know if this helps.
>>>>
>>>>     -Jim
>>>>
>>>>     On Tue, 2 Jul 2013, Bill Moseley wrote:
>>>>
>>>>           For requests that are chunked (Transfer-Encoding:
>>>>           chunked and no
>>>>           Content-Length header) calling $r->read returns
>>>>           unchunked data from the
>>>>           socket.
>>>>           That's indeed handy.  Is that mod_perl doing that
>>>>           un-chunking or is it
>>>>           Apache?
>>>>
>>>>           But, it leads to some questions.   
>>>>
>>>>           First, if $r->read reads unchunked data then why is
>>>>           there a
>>>>           Transfer-Encoding header saying that the content is
>>>>           chunked?   Shouldn't
>>>>           that header be removed?   How does one know if the
>>>>           content is chunked or
>>>>           not, otherwise?
>>>>
>>>>           Second, if there's no Content-Length header then how
>>>>           does one know how much
>>>>           data to read using $r->read?   
>>>>
>>>>           One answer is until $r->read returns zero bytes, of
>>>>           course.  But, is
>>>>           that guaranteed to always be the case, even for,
>>>>           say, pipelined requests?  
>>>>           My guess is yes because whatever is de-chunking the
>>>>           request knows to stop
>>>>           after reading the last chunk, trailer and empty
>>>>           line.   Can anyone elaborate
>>>>           on how Apache/mod_perl is doing this? 
>>>>
>>>>           Perhaps I'm approaching this incorrectly, but this
>>>>           is all a bit untidy.
>>>>
>>>>           I'm using Catalyst and Catalyst needs a
>>>>           Content-Length.  So, I have a Plack
>>>>           Middleware component that creates a temporary file
>>>>           writing the buffer from
>>>>           $r->read( my $buffer, 64 * 1024 ) until that returns
>>>>           zero bytes.  I pass
>>>>           this file handle onto Catalyst.
>>>>
>>>>           Then, for some content-types, Catalyst (via
>>>>           HTTP::Body) writes the body to
>>>>           another temp file.    I don't know how
>>>>           Apache/mod_perl does its de-chunking,
>>>>           but I can call $r->read with a huge buffer length
>>>>           and Apache returns that.
>>>>            So, maybe Apache is buffering to disk, too.
>>>>
>>>>           In other words, for each tiny chunked JSON POST or
>>>>           PUT I'm creating two (or
>>>>           three?) temp files which doesn't seem ideal.
>>>>
>>>>           --
>>>>           Bill Moseley
>>>>           mose...@hank.org
>>>> --
>>>> Bill Moseley
>>>> mose...@hank.org

Re: mod_perl and Transfer-Encoding: chunked

Reply via email to