Re: [Django] #12578: multipartparser.Parser does not accept non-canonical bare CR and bare LF

Django Thu, 28 Jan 2010 14:00:07 -0800

#12578: multipartparser.Parser does not accept non-canonical bare CR and bare LF
------------------------------------+---------------------------------------
          Reporter:  jfenwick       |         Owner:  nobody
            Status:  closed         |     Milestone:        
         Component:  HTTP handling  |       Version:  1.1   
        Resolution:  invalid        |      Keywords:  jython
             Stage:  Unreviewed     |     Has_patch:  0     
        Needs_docs:  0              |   Needs_tests:  0     
Needs_better_patch:  0              |  
------------------------------------+---------------------------------------
Comment (by jfenwick):


 I'm not sure what the real problem is, so rather than open a new ticket I
 will ask some questions here in the hope that someone can answer them.

 The issue is that in on Windows, in Django-Jython on Tomcat, multipart
 data is not parsed correctly by the LazyStream in multipartparser.py.

 As far as I can tell, this happens because at
 
http://code.djangoproject.com/browser/django/trunk/django/http/multipartparser.py#L553
 because the character CRLFCRLF is not found.

 I did some experiments. I POSTed some data using the same app on four
 different Django platforms.
 Here are the platforms I tested on, and the associated data that was
 output as hex:

 Django on Python on runserver on Windows - multipartpythonwindows.hex
 Django on Jython on runserver on Windows -
 multipartdjangojythonwindows.hex
 Django on Jython on Tomcat on Windows - multiparttomcatjythonwindows.hex
 Django on Python on runserver on OS X - multipartpythonosx.hex (note: I
 used a different Django app, but I believe the result would have been the
 same)

 The data was dumped using the code in multipartparser.diff

 Note: I ran the files I generated through hexdump -C file to generate the
 hex files from the data files I created.

 According to
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.2 the message
 body of the multipart should use CRLF as a line break between body-parts.
 If you look at multipartpythonwindows.hex you will see that what actually
 happens is all CRs are replaced with CRCR. This means that when the cited
 line in multipartparser.py is looking for CRLFCRLF, it instead finds
 CRCRLFCRCRLF. I would have thought this would fail, but it does not! It
 works correctly. This is the normal operating procedure of Django on
 Python, as far as I can tell.

 In multipartdjangojythonwindows.hex you will see that the pattern of CRs
 being replaced by CRCR still occurs. This means that Django on Jython
 running on runserver works the same as Django on Python running on
 runserver, and as a result, still works.

 Now we go to multiparttomcatjythonwindows.hex. In this file, the CRLF
 comes as you would expect it in RFC 2616. In this case, chunk.find fails,
 which is the root of the problem.

 Finally, compare the CR characters of multipartpythonosx.hex. You will see
 it does not duplicate CR the same way multiparttomcatjythonwindows does.
 And yet it works.

 These are my questions:

 1. Where is that extra CR coming from and why is it required? Why does
 this not result in failure?
 2. Could there be something wrong with my method of data collection as
 specified in multipartparser.diff?
 3. kmtracy previously said "All the code touching this data before feeding
 it to the Django code needs to be treating it as binary, not text, and not
 doing any type of line normalization." Is there a way I can check whether
 the data is binary or text in Python to verify this is the case?

 I'm sorry if this is not the correct avenue to be asking these questions.
 If there is a better one, please point me in that direction.

-- 
Ticket URL: <http://code.djangoproject.com/ticket/12578#comment:8>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-upda...@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Re: [Django] #12578: multipartparser.Parser does not accept non-canonical bare CR and bare LF

Reply via email to