On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:

> Given a message, there are many times you want to serialize it as text
> (for example, for presentation in a UI).  You could provide alternate
> serialization methods to get text out on demand....but then what if
> someone wants to push that text representation back in to email to
> rebuild a model of the message?

You tell them "too bad, make some bytes out of that text."  Leave it up to the 
application.  Period, the end, it's not the library's job.  If you pushed the 
text out to a 'view message source' UI representation, then the vicissitudes of 
the system clipboard and other encoding and decoding things may corrupt it in 
inscrutable ways.  You can't fix it.  Don't try.

> So now we have both a bytes parser and a string parser.

Why do so many messages on this subject take this for granted?  It's wrong for 
the email module just like it's wrong for every other package.

There are plenty of other (better) ways to deal with this problem.  Let the 
application decide how to fudge the encoding of the characters back into bytes 
that can be parsed.  "In the face of ambiguity, refuse the temptation to guess" 
and all that.  The application has more of an idea of what's going on than the 
library here, so let it make encoding decisions.

Put another way, there's nothing wrong with having a text parser, as long as it 
just encodes the text according to some known encoding and then parses the 
bytes :).


> So, after much discussion, what we arrived at (so far!) is a model
> that mimics the Python3 split between bytes and strings.  If you
> start with bytes input, you end up with a BytesMessage object.
> If you start with string input to the parser, you end up with a
> StringMessage.

That may be a handy way to deal with some grotty internal implementation 
details, but having a 'decode()' method is broken.  The thing I care about, as 
a consumer of this API, is that there is a clearly defined "Message" interface, 
which gives me a uniform-looking place where I can ask for either characters 
(if I'm displaying them to the user) or bytes (if I'm putting them on the 
wire).  I don't particularly care where those bytes came from.  I don't care 
what decoding tricks were necessary to produce the characters.

Now, it may be worthwhile to have specific normalization / debrokenifying 
methods which deal with specific types of corrupt data from the wire; 
encoding-guessing, replacement-character insertion or whatever else are fine 
things to try.  It may also be helpful to keep around a list of errors in the 
message, for inspection.  But as we know, there are lots of ways that MIME data 
can go bad other than encoding, so that's just one variety of error that we 
might want to keep around.

(Looking at later messages as I'm about to post this, I think this all sounds 
pretty similar to Antoine's suggestions, with respect to keeping the 
implementation within a single class, and not having 
BytesMessage/UnicodeMessage at the same abstraction level.)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to