On Sun, 04 May 2008 12:26:31 +1200 (NZST)
Derek Smithies <[EMAIL PROTECTED]> wrote:

> 
> On Sat, 3 May 2008, Steve Holdoway wrote:
> >
> > Woah there... the mail client is just reading the mailbox, and the 
> > design of that mailbox is down to the process serving your imap 
> > requests. You may have a local copy of the email, but you're also using 
> > remote software to manage / deliver new mail as well. Although it's an 
> > irrelevance, and a service you're paying for, but your ISP has to 
> > administrate / back up all of these emails as well. You can probably 
> > guess what I primarily do for a living - trust me, when you've got 
> > millions of mailboxes to look after, it's a huge undertaking (:
> 
> No, it is a mail client problem.
>   mail client A takes minutes to open a remote imap folder.
> 
>   mail client B (alpine) takes seconds to open the same remote imap folder.
> 
> Conclusion,
>   mail client B has a better design, and is better for sysadmins.
>   mail client B is clearly doing something less intensive on the network. 
> Turns out that B is using clever IMAP commands to just receive the last 
> headers in the folder, and is consequently heaps quicker.
Every mail client worth it's salt will send a message to the server along the 
lines of 'I'm up to message number 36, can I have any new ones', unless you 
request that a complete re-index is performed. I'd say that your client A is 
obviously braindead, and the comparison is unfair ( Probably Evolution, and 
you're not smoking the right stuff to understand which things to set to make it 
work as you want it! ). Like I said, I use sylpheed. I'd like to see your 
findings ( once you've initialised your client! ) against alpine. Not having 
hundreds of thousands of emails, I'm not able to test (: It'll probably be 
slower to start up, but at least it'll have a graphical interface when it gets 
there!

Your comment on alpine being better for sysadmins is only partly true - as 
their fundamental problem is managing your 100's of MB ( assuming a nominal 1KB 
email size ), times the client base. The amount of resource expended in 
synchronising imap headers is a fraction of that involved in backing up the 
emails themselves. 

In a previous life, I commissioned a 4 million mailbox IMAP platform. We had 3 
computers and a 500 tape robot dedicated solely to performing backups. Just how 
much more resource would we have needed to manage the platform if all mailboxes 
were as full as yours? You could probably get away with raid5 500GB SATA disks 
for the backup, but you'll have to have fast SCSI for the mailstore. 
Realistically, SCSI disks are 300GB Max for 15krpm, so you're talking 2,500 
SCSI and 1200 SATA disks, which is about 20 racksworth. At a nominal 10W/disk, 
that's 20KW/rack ( many data centres design for 1KW/rack! ). So if everyone in 
New Zealand used IMAP mail as you do, then the primary disk storage alone would 
consume 400KW.

Sure, I'm exaggerating to play the devils advocate here. But look at the 
problems that ISPs are having in the UK now that the BBC is allowing free 
downloads - a change in the way resources are used is having to lead to a 
revolution in billing strategy.

> 
> 
> > I think that you're also not taking into account the limitations of the 
> > filesystem itself. Each email is a file, and to have hundreds of 
> > thousands of files in a single directory will never be efficient. How 
> > many levels of indirection will you be going through on an ext3 system??
> Not really. $200 for a 750GB drive - seems to me that disks are getting 
> hugely cheap.
Sorry, you're missing the point. The size of a drive has nothing to do with the 
efficiency of the file system. Storing many small files in a single directory 
is never efficient. Sure, some file systems are better than others at managing 
this, but... 
> 
> 
> >
> > If you're wanting a searchable resource, then I personally think that a 
> > mailbox and mail client is a poor choice of toolkit. It would be a 
> > fairly trivial task to import them into an ht//Dig indexed resource or a 
> > wiki - although I have yet to see any mailing list, let alone a popular 
> > one, have a high enough s/n ratio to make it worth keeping everything!
> yes/no.
> Several lists provide a search engine of their lists that works quite 
> nicely. Other lists provide no usable search engine. Simpler to just keep 
> a copy of all lists.
Or generate your own with appropriate tools??
> 
> There are emails from a person that you read and go, "twit" (or similar) 
> and decide to dump. Later, you receive an email from someone asking for 
> help. The first thing you do is seach on that person's name. Since you 
> have a copy of all previous correspondence, you find if they have indeed 
> written to you.
tbh, If I get to the "or similar" stage, then procmail is forwarding all of 
their correspondence to /dev/null. So the chances of seeing their request for 
help are pretty remote (:
> 
> From a commercial perspective, you keep all written correspondence. Why 
> not the same with email ?
The fundamental difference between the two is that it costs money to send a 
letter. If I write a script to send you an email of junk every minute, are you 
going to keep them? Yes, I know it's an extreme example, but I deal with 10's 
of millions of spam emails a day, so I see a huge difference between the two.

The point that I'm making is not that it's bad to keep all emails and use them 
as a searchable archive, but that accessing that archive using a mail 
server/client is a less than optimal solution for many people.

Steve
-- 
Steve Holdoway <[EMAIL PROTECTED]>

Reply via email to