Hi there. I'm planning to write an email stream indexer that locates the byte offsets of each MIME body-part, sub-part, preamble, epilogue, etc. and avoids pulling an entire message into memory. (The existing email package doesn't seem to offer this functionality.) I will most likely use BytesFeedParser to parse message headers.
I just discovered that the Message object produced by BytesFeedParser returns a string from get_boundary(). I expected it to return bytes, because my input is bytes and I will therefore have to compare each boundary with bytes while indexing. I can convert the string to bytes using the ascii codec, but I thought I'd raise the issue here in case the current behavior is a bug. Considering the restrictions that rfc 2046 places on boundary characters and its requirement to respect ancestor boundary markers when parsing nested messages, I'm struggling to think of a situation where the current behavior is useful. Shouldn't get_boundary() return something that can be found within the input data? _______________________________________________ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com