On Jul 17, 2006, at 8:35 PM, Mark Sapiro wrote:

I just looked at the fix in SVN, and I think there is still a problem.
I don't think the RFC 2231 encodings that produce the error are
'buggy'. There are two independent things going on in RFC 2231 - the
charset and language encoding and the splitting of the parameter into
multiple pieces, e.g. filename*0=, filename*1=, etc.

The problem with email.utils.decode_params() is it doesn't distinguish
between these cases. The charset/language information is only present
if there is a * immediately preceeding the = as in

filename*=charset'language'value

or

filename*0*=charset'language'value
...

in these cases, a compliant value must not contain '

However, if the parameter is

filename*0=value_part_0
filename*1=value_part_1
...

these value_parts may contain any number of ' characters and they don't
delimit charset and language information.

See my suggested patch attached to
<http://mail.python.org/pipermail/email-sig/2006-July/000293.html>.

Mark, I think you're right in your diagnosis. I've gone back and re- read RFC 2231 and I agree that we need to distinguish between the two segment types, which I'll call encoded (name ends in *) and non- encoded (no * at end of name).

The way I read the RFC however, I don't think the patch is quite right. Specifically, you can mix encoded and non-encoded segments in an extended parameter, like so:

filename*0*="This is%20encoded"
filename*1="This is%20not encoded"

I believe this should end up with a 'filename' parameter with a value:

This is encodedThis is%20not encoded

Further, if any segment ends in a * then the charset and language information must appear at the front of the string, but this is decoded after segments are %-decoded and all the segments are concatenated together. (The RFC appears to be a bit ambiguous here, but this is the only interpretation that makes sense to me.)

Both of these changes caused many failures in the test suite, but I believe that's because many of the tests were incorrect. Some broke because they were using all non-encoded segments yet were expecting Message.get_param() to return a 3-tuple. That interface, while yucky, seems clear that when all non-encoded segments are used, the return value should be a simple string.

The other breakage was that non-encoded segments should not be %- decoded, but there were many cases where they were still being decoded.

I believe the attached patch fixes all these cases, and yet retains the failsafe checks in decode_rfc2231() -- be liberal in what you accept, blah, blah, blah. The patch also updates all the affected tests. This patch is against the Python trunk. Please let me know what you think! If it looks good, I'll commit it and back port the whole schmere to the earlier email package versions.

-Barry

Attachment: email.diff
Description: Binary data

Attachment: PGP.sig
Description: This is a digitally signed message part

_______________________________________________
Email-SIG mailing list
[email protected]
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Reply via email to