Re: [Email-SIG] Design Thoughts Summary

Eric S. Johansson Sun, 03 Jan 2010 18:34:26 -0800

On 11/15/2009 1:01 PM, Barry Warsaw wrote:

On Nov 14, 2009, at 5:12 PM, Matthew Dixon Cowles wrote:

Thank you. I am virtually 100% in agreement that this document
represents what people have agreed on and that it represents what is
sensible to do.


As am I. Fantastic work in pulling this all together David.

I'm a bit slammed right now, but a quick comment...

* The API needs to at a minimum have hooks available for an
application to store data on disk rather than holding everything in
memory.


I remain unconvinced that this is worth the trouble. Yes, the Twisted
folks say that they can't use the email module because they may be
receiving hundreds of messages at once. But can anyone do anything
with hundreds of messages at once other than write them to disk?

And would anything actually be improved by reading hundreds of files
at once, in small chunks, looking for MIME separators?


Mailman has a similar problem. Even if we get just a few big messages,
they can crush the system. You could argue that the MTA should just
block messages with 50MB bodies if the underlying Mailman code can't
handle it, but I still think we can do better.

I think we're fine if all the headers and MIME structure were kept in
memory it would be fine. But I do think we just want to be able to never
store the raw body content in memory (perhaps unless needed, on demand).
Mailman for example rarely cares about the bytes of say an image/jpeg body.

for what it's worth, I've also experienced the same "crushing blow" caused bylarge messages in memory. In my case, I immediately dumped all messages to adatabase (unfortunately, SQL), extracted the essential metadata I needed for myapplication and kept it in the record selected index and search on it. I alsostored the raw message and the processed message in the database as well. Reasonbeing, that I wanted to be able to analyze the raw message if something failed(usually Unicode failure) and be able to retrieve the e-mail object from itsjson container for quick(er) processing and I would get with parsing the rawmessage again (and again).

This experience makes me a supporter of an e-mail module that has a storagecontainer object that can be searched by any number of metadata fields. thesemetadata fields would consist of internal (to the message) data sources andexternal data sources. I believe it would be necessary to specify whatsearchable fields you want before creating the storage container.

I hope that it would be possible to make the storage container backend StorageTechnology independent so that people like me who will detest SQL until the heatdeath of the universe can use something else to store mail messages. I wouldalso recommend not depending on the file system because in my experience,performance declined dramatically around 500 messages (ext3 adn jfs). Eventhough I was using an SQL database (SQLite), it was significantly faster usingthe database.

Thanks to all who are working on this project. I wish I could participate morebut, life has other plans for me.

_______________________________________________
Email-SIG mailing list
[email protected]
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Re: [Email-SIG] Design Thoughts Summary

Reply via email to