http://code.djangoproject.com/ticket/8149
As mentioned in the ticket, `UploadedFile.__iter__` iterates over a `StringIO` object to yield each line of the uploaded file (including line endings). Unfortunately the current version of `StringIO` only treats `\n` as a line ending, but iterating over a file object will yield lines regardless of the line ending type. I believe that future versions of `StringIO` work the same way as file objects do now. The problem with the current implementation is that we can't know what line ending uploaded text files will use, so anybody trying to iterate across lines of an uploaded file might get multiple lines or even the entire chunk or file in a single iteration. In order for users to reliably iterate through each line in an uploaded text file, they will need to write their own iterator to account for each possibility. A few possible workarounds for users that I can think of are: 1) Save the file to a temporary location on disk and open it as a file object, then iterate through that. This can be cumbersome because by default files under 2.5 MB are stored in memory, while larger files are already stored in a temporary location on disk. 2) Write a new iterator which includes the chunk/buffer logic of `UploadedFile.__iter__` but treats `\r\n` and `\r` as line endings as well as `\n`. 3) Load the entire file into memory and split it (if you don't need to retain the line endings). A few possible solutions on the django side could be: 1) Subclass `StringIO` and override `readline` to work with other line endings. This could be useful in other areas of Django, and could be considered similar to making the decimal module available to Python 2.3, by making future functionality of `StringIO` available now. 2) Rewrite `UploadedFile.__iter__` to not use `StringIO`. Some alternatives might be to parse the string in a similar way to `StringIO.readline`, or to use `re.findall` (with a gross pattern like the one found in the patch attached to the ticket), or to use `re.split` with a slightly less offensive pattern such as `re.split(r'(\r\n|\r|\n)', ...)` which would yield lines and line endings alternatively. Personally I think that it's not a rare edge case that users will want to accept text file uploads from unknown sources and that they should be able to iterate over each line of uploaded text files without re- writing that functionality in their own code. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---
