On Fri, 15 Mar 2013 13:16:32 -0700, Forest <web11.for...@tibit.com> wrote: > ascii codec, but I thought I'd raise the issue here in case the current > behavior is a bug. Considering the restrictions that rfc 2046 places on > boundary characters and its requirement to respect ancestor boundary markers > when parsing nested messages, I'm struggling to think of a situation where > the current behavior is useful. Shouldn't get_boundary() return something > that can be found within the input data?
Well, you have to understand that the email package was written when Python didn't make any distinction between bytes and strings. What email in Python3 is doing is transforming the input into string (unicode) right away, and carrying any non-ascii bytes along until it has parsed enough information from the message to recover them and convert them into real unicode. BytesParser is parsing bytes input and *turning it into unicode*. The model is the same regardless of whether the input is bytes or already string. get_boundary is a method on the model (the Message) and is thus retrieving a string from the model and returning it. That said, we have discussed adding methods for accessing the binary form in various contexts. We have also discussed providing a stream version of message parsing and generation, and at a minimum a way to store message bodies externally (eg in a file). I've got these as development goals and welcome help in doing so. --David _______________________________________________ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com