On 3/24/2011 2:41 PM, Barry Warsaw wrote:
On Mar 24, 2011, at 05:10 PM, Steffen Daode Nurpmeso wrote:
It would be great if the message (file) size would also be
provided as a public method, so that code-flow decisions can be
made dependend upon the plain size of a message.
(The size is known without parsing for many real-life message
objects anyway or can be detected *cheap*. True, e.g., for
all Message objects which are created by mailbox.py.)
Certainly the normal FeedParser will see every byte of the message, even if it
does save parts of it on disk. Mailman 3's LMTP server also sees every byte
and tucks the size away on an .original_size attribute of its Message
subclass.
But how would you handle it when you are creating the message yourself? I
think there are too many places you'd have to hook to get an accurate reading,
or you'd have to essentially serialize it via a generator before you'd know,
so it's less than helpful.
It may indeed be possible to ask some external process what the size of the
message is, but it would likely be a hint you couldn't necessarily trust.
(I.e. the server might only have an approximate size.)
So, I'm not sure whether the email package can have a consistent notion of a
message's 'size'. Perhaps though it ought to define an attribute for when the
message is created by a parser, but let it be writable so that e.g. your
application could get it from an IMAP server or whatever, and stick it in the
attribute.
When created by a parser, it could have the notion of size-seen-so-far,
or bytes-fed. Once the whole message has been processed, the size of
the message would be known, as well as of each piece.
Incomplete messages, such as those from IMAP servers for which only
partial requests have been made for pieces, could only get the concept
of "total size" from the server, if it provides it. Since POP servers
do, I think IMAP would also, but I'm not an IMAP expert.
It's also so unfortunate that 'headersonly' of Parser is in fact treated as
"a backwards compatibility hack", effectively consuming the entire input
nonetheless. And *DesignThoughts* treats lazy parsing/partial loading as an
"interesting idea" only, though i can think about many cases where it is a
good thing to parse a Message{Headers[/Part/Part/Part...]} sequentially.
E.g., why should a spam detector load an entire message if it only wants to
check addresses against some white-/blacklists and simply throw away bad
hits. Even more, why should a companies dispatcher read all the content if
it's only about to rewrite addresses and dispatch the mail to some other
internal server. (Of course - hey, it's you, you know *such* more about this
stuff than i do.)
Do you have suggestions for how the email package can help with these use
cases? Do you have specific API or implementation proposals?
For message parsing, it seems like allowing registered callbacks for
various pieces would be handy... "Call me when you parse this type of a
header" (or body part, etc.).
_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options:
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com