On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:
> Given a message, there are many times you want to serialize it as text
> (for example, for presentation in a UI). You could provide alternate
> serialization methods to get text out on demand....but then what if
> someone wants to push that text representation back in to email to
> rebuild a model of the message?
You tell them "too bad, make some bytes out of that text." Leave it up to the
application. Period, the end, it's not the library's job. If you pushed the
text out to a 'view message source' UI representation, then the vicissitudes of
the system clipboard and other encoding and decoding things may corrupt it in
inscrutable ways. You can't fix it. Don't try.
> So now we have both a bytes parser and a string parser.
Why do so many messages on this subject take this for granted? It's wrong for
the email module just like it's wrong for every other package.
There are plenty of other (better) ways to deal with this problem. Let the
application decide how to fudge the encoding of the characters back into bytes
that can be parsed. "In the face of ambiguity, refuse the temptation to guess"
and all that. The application has more of an idea of what's going on than the
library here, so let it make encoding decisions.
Put another way, there's nothing wrong with having a text parser, as long as it
just encodes the text according to some known encoding and then parses the
bytes :).
> So, after much discussion, what we arrived at (so far!) is a model
> that mimics the Python3 split between bytes and strings. If you
> start with bytes input, you end up with a BytesMessage object.
> If you start with string input to the parser, you end up with a
> StringMessage.
That may be a handy way to deal with some grotty internal implementation
details, but having a 'decode()' method is broken. The thing I care about, as
a consumer of this API, is that there is a clearly defined "Message" interface,
which gives me a uniform-looking place where I can ask for either characters
(if I'm displaying them to the user) or bytes (if I'm putting them on the
wire). I don't particularly care where those bytes came from. I don't care
what decoding tricks were necessary to produce the characters.
Now, it may be worthwhile to have specific normalization / debrokenifying
methods which deal with specific types of corrupt data from the wire;
encoding-guessing, replacement-character insertion or whatever else are fine
things to try. It may also be helpful to keep around a list of errors in the
message, for inspection. But as we know, there are lots of ways that MIME data
can go bad other than encoding, so that's just one variety of error that we
might want to keep around.
(Looking at later messages as I'm about to post this, I think this all sounds
pretty similar to Antoine's suggestions, with respect to keeping the
implementation within a single class, and not having
BytesMessage/UnicodeMessage at the same abstraction level.)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com