On Jul 17, 2006, at 8:35 PM, Mark Sapiro wrote:
I just looked at the fix in SVN, and I think there is still a problem. I don't think the RFC 2231 encodings that produce the error are 'buggy'. There are two independent things going on in RFC 2231 - the charset and language encoding and the splitting of the parameter into multiple pieces, e.g. filename*0=, filename*1=, etc.The problem with email.utils.decode_params() is it doesn't distinguish between these cases. The charset/language information is only present if there is a * immediately preceeding the = as in filename*=charset'language'value or filename*0*=charset'language'value ... in these cases, a compliant value must not contain ' However, if the parameter is filename*0=value_part_0 filename*1=value_part_1 ...these value_parts may contain any number of ' characters and they don'tdelimit charset and language information. See my suggested patch attached to <http://mail.python.org/pipermail/email-sig/2006-July/000293.html>.
Mark, I think you're right in your diagnosis. I've gone back and re- read RFC 2231 and I agree that we need to distinguish between the two segment types, which I'll call encoded (name ends in *) and non- encoded (no * at end of name).
The way I read the RFC however, I don't think the patch is quite right. Specifically, you can mix encoded and non-encoded segments in an extended parameter, like so:
filename*0*="This is%20encoded" filename*1="This is%20not encoded" I believe this should end up with a 'filename' parameter with a value: This is encodedThis is%20not encodedFurther, if any segment ends in a * then the charset and language information must appear at the front of the string, but this is decoded after segments are %-decoded and all the segments are concatenated together. (The RFC appears to be a bit ambiguous here, but this is the only interpretation that makes sense to me.)
Both of these changes caused many failures in the test suite, but I believe that's because many of the tests were incorrect. Some broke because they were using all non-encoded segments yet were expecting Message.get_param() to return a 3-tuple. That interface, while yucky, seems clear that when all non-encoded segments are used, the return value should be a simple string.
The other breakage was that non-encoded segments should not be %- decoded, but there were many cases where they were still being decoded.
I believe the attached patch fixes all these cases, and yet retains the failsafe checks in decode_rfc2231() -- be liberal in what you accept, blah, blah, blah. The patch also updates all the affected tests. This patch is against the Python trunk. Please let me know what you think! If it looks good, I'll commit it and back port the whole schmere to the earlier email package versions.
-Barry
email.diff
Description: Binary data
PGP.sig
Description: This is a digitally signed message part
_______________________________________________ Email-SIG mailing list [email protected] Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
