On 11/15/2009 1:01 PM, Barry Warsaw wrote:
On Nov 14, 2009, at 5:12 PM, Matthew Dixon Cowles wrote:

Thank you. I am virtually 100% in agreement that this document
represents what people have agreed on and that it represents what is
sensible to do.

As am I. Fantastic work in pulling this all together David.

I'm a bit slammed right now, but a quick comment...

* The API needs to at a minimum have hooks available for an
application to store data on disk rather than holding everything in
memory.

I remain unconvinced that this is worth the trouble. Yes, the Twisted
folks say that they can't use the email module because they may be
receiving hundreds of messages at once. But can anyone do anything
with hundreds of messages at once other than write them to disk?

And would anything actually be improved by reading hundreds of files
at once, in small chunks, looking for MIME separators?

Mailman has a similar problem. Even if we get just a few big messages,
they can crush the system. You could argue that the MTA should just
block messages with 50MB bodies if the underlying Mailman code can't
handle it, but I still think we can do better.

I think we're fine if all the headers and MIME structure were kept in
memory it would be fine. But I do think we just want to be able to never
store the raw body content in memory (perhaps unless needed, on demand).
Mailman for example rarely cares about the bytes of say an image/jpeg body.

for what it's worth, I've also experienced the same "crushing blow" caused by large messages in memory. In my case, I immediately dumped all messages to a database (unfortunately, SQL), extracted the essential metadata I needed for my application and kept it in the record selected index and search on it. I also stored the raw message and the processed message in the database as well. Reason being, that I wanted to be able to analyze the raw message if something failed (usually Unicode failure) and be able to retrieve the e-mail object from its json container for quick(er) processing and I would get with parsing the raw message again (and again).

This experience makes me a supporter of an e-mail module that has a storage container object that can be searched by any number of metadata fields. these metadata fields would consist of internal (to the message) data sources and external data sources. I believe it would be necessary to specify what searchable fields you want before creating the storage container.

I hope that it would be possible to make the storage container backend Storage Technology independent so that people like me who will detest SQL until the heat death of the universe can use something else to store mail messages. I would also recommend not depending on the file system because in my experience, performance declined dramatically around 500 messages (ext3 adn jfs). Even though I was using an SQL database (SQLite), it was significantly faster using the database.

Thanks to all who are working on this project. I wish I could participate more but, life has other plans for me.
_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

Reply via email to