#12578: multipartparser.Parser does not accept non-canonical bare CR and bare LF ------------------------------------+--------------------------------------- Reporter: jfenwick | Owner: nobody Status: closed | Milestone: Component: HTTP handling | Version: 1.1 Resolution: invalid | Keywords: jython Stage: Unreviewed | Has_patch: 0 Needs_docs: 0 | Needs_tests: 0 Needs_better_patch: 0 | ------------------------------------+--------------------------------------- Comment (by jfenwick):
I'm not sure what the real problem is, so rather than open a new ticket I will ask some questions here in the hope that someone can answer them. The issue is that in on Windows, in Django-Jython on Tomcat, multipart data is not parsed correctly by the LazyStream in multipartparser.py. As far as I can tell, this happens because at http://code.djangoproject.com/browser/django/trunk/django/http/multipartparser.py#L553 because the character CRLFCRLF is not found. I did some experiments. I POSTed some data using the same app on four different Django platforms. Here are the platforms I tested on, and the associated data that was output as hex: Django on Python on runserver on Windows - multipartpythonwindows.hex Django on Jython on runserver on Windows - multipartdjangojythonwindows.hex Django on Jython on Tomcat on Windows - multiparttomcatjythonwindows.hex Django on Python on runserver on OS X - multipartpythonosx.hex (note: I used a different Django app, but I believe the result would have been the same) The data was dumped using the code in multipartparser.diff Note: I ran the files I generated through hexdump -C file to generate the hex files from the data files I created. According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.2 the message body of the multipart should use CRLF as a line break between body-parts. If you look at multipartpythonwindows.hex you will see that what actually happens is all CRs are replaced with CRCR. This means that when the cited line in multipartparser.py is looking for CRLFCRLF, it instead finds CRCRLFCRCRLF. I would have thought this would fail, but it does not! It works correctly. This is the normal operating procedure of Django on Python, as far as I can tell. In multipartdjangojythonwindows.hex you will see that the pattern of CRs being replaced by CRCR still occurs. This means that Django on Jython running on runserver works the same as Django on Python running on runserver, and as a result, still works. Now we go to multiparttomcatjythonwindows.hex. In this file, the CRLF comes as you would expect it in RFC 2616. In this case, chunk.find fails, which is the root of the problem. Finally, compare the CR characters of multipartpythonosx.hex. You will see it does not duplicate CR the same way multiparttomcatjythonwindows does. And yet it works. These are my questions: 1. Where is that extra CR coming from and why is it required? Why does this not result in failure? 2. Could there be something wrong with my method of data collection as specified in multipartparser.diff? 3. kmtracy previously said "All the code touching this data before feeding it to the Django code needs to be treating it as binary, not text, and not doing any type of line normalization." Is there a way I can check whether the data is binary or text in Python to verify this is the case? I'm sorry if this is not the correct avenue to be asking these questions. If there is a better one, please point me in that direction. -- Ticket URL: <http://code.djangoproject.com/ticket/12578#comment:8> Django <http://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-upda...@googlegroups.com. To unsubscribe from this group, send email to django-updates+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-updates?hl=en.