Hi there.

I'm planning to write an email stream indexer that locates the byte offsets
of each MIME body-part, sub-part, preamble, epilogue, etc. and avoids
pulling an entire message into memory.  (The existing email package doesn't
seem to offer this functionality.)  I will most likely use BytesFeedParser
to parse message headers.

I just discovered that the Message object produced by BytesFeedParser
returns a string from get_boundary().  I expected it to return bytes,
because my input is bytes and I will therefore have to compare each boundary
with bytes while indexing.  I can convert the string to bytes using the
ascii codec, but I thought I'd raise the issue here in case the current
behavior is a bug.  Considering the restrictions that rfc 2046 places on
boundary characters and its requirement to respect ancestor boundary markers
when parsing nested messages, I'm struggling to think of a situation where
the current behavior is useful.  Shouldn't get_boundary() return something
that can be found within the input data?
_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Reply via email to