subject:"\[Dovecot\] SSD drives are really fast running Dovecot"

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-24 Thread Frank Cusack


On 1/20/11 11:49 PM -0600 Stan Hoeppner wrote:

Frank Cusack put forth on 1/20/2011 2:30 PM:

On 1/20/11 12:06 AM -0600 Stan Hoeppner wrote:

 This is amusing considering XFS is hands down
the best filesystem available on any platform, including ZFS.  Others
are simply ignorant and repeat what they've heard without looking for
current information.



Your pronouncement that others are simply ignorant is telling.


So is your intentionally quoting me out of context.


Not at all.  Your statement about ignorance has no context required.


The ignorant are those who blindly accept the false words of others
regarding 4+ year old XFS corruption on power fail as being true today.
They accept but without verification.  Hence the rumor persists in many
places.


Indeed, those folks are more than ignorant, they are in fact idiots.
(Ignorant meaning simply unaware.)


In my desire to be brief I didn't fully/correctly explain how delayed
logging works.  I attempted a simplified explanation that I thought most
would understand.  Here is the design document:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

Early performance numbers:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html;

Note the double URL paste error?  Frank?  Why did you twist an honest
mistake into something it's not?  Here's the correct link:


Wow so you are basically an asshole as well as arrogant.


Stop being an ass.  Or get off yours and Google instead of requiring me
to spoon feed you.


LOL that actually made me laugh, thanks.


This is guaranteed to lose data on power loss or drive failure.


On power loss, on a busy system, yes.  Due to a single drive failure?
That's totally incorrect.  How are you coming to that conclusion?


Why don't you re-read the design.  I'm not going to spoon feed you.


Performance always has a trade off.  The key here is that the filesystem
isn't corrupted due to this metadata loss.  Solaris with ZFS has the same
issues.  One can't pipeline anything in a block device queue and not have
some data loss on power failure, period.  If one syncs every write then
you have no performance. Solaris and ZFS included.


You might want to get current on ZFS as well.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-24 Thread Frank Cusack


Sorry all.  I responded before catching up to the end of the thread.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-24 Thread Patrick Ben Koetter




* Frank Cusack frank+lists/dove...@linetwo.net:
 On 1/20/11 11:49 PM -0600 Stan Hoeppner wrote:
 Frank Cusack put forth on 1/20/2011 2:30 PM:
 On 1/20/11 12:06 AM -0600 Stan Hoeppner wrote:

...

 Wow so you are basically an asshole as well as arrogant.

Please give it a break. Take private things offlist.

p@rick


-- 
state of mind
Digitale Kommunikation

http://www.state-of-mind.de

Franziskanerstraße 15  Telefon +49 89 3090 4664
81669 München  Telefax +49 89 3090 4666

Amtsgericht MünchenPartnerschaftsregister PR 563

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-21 Thread Jerry

Seriously, isn't it time this thread died a peaceful death. It has long
since failed to to have any real relevance to Dovecot, except in the
most extreme sense. It has evolved into a few testosterone poisoned
individuals attempting to make this forum a theater for some mating
ritual. If they seriously want to continue this convoluted thread,
perhaps they would be so kind as to take it of-list and find a platform
better suited for this public display. At the very least, I would hope
that Timo might consider closing this thread. I know that Wietse would
never have let this thread reach this point on the Postfix forum.

In any case, I am not creating a kill filter to dispense with it.

-- 
Jerry ✌
dovecot.u...@seibercom.net

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.
__

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-21 Thread Stan Hoeppner

Jerry put forth on 1/21/2011 7:53 AM:
 Seriously, isn't it time this thread died a peaceful death. It has long
 since failed to to have any real relevance to Dovecot, except in the
 most extreme sense. It has evolved into a few testosterone poisoned
 individuals attempting to make this forum a theater for some mating
 ritual. If they seriously want to continue this convoluted thread,
 perhaps they would be so kind as to take it of-list and find a platform
 better suited for this public display. At the very least, I would hope
 that Timo might consider closing this thread. I know that Wietse would
 never have let this thread reach this point on the Postfix forum.
 
 In any case, I am not creating a kill filter to dispense with it.

I'm guilty as charged.  Consider it dead.  Sorry for the noise Jerry, everyone.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-20 Thread Ed W


On 20/01/2011 06:06, Stan Hoeppner wrote:

If you think the above is hostile you have lived a privileged and sheltered
life, and I envy you. :)  That isn't hostile but a combination of losing
patience and being blunt.  Hostile is f--k you!.  Obviously I wasn't being
hostile.


I'm living in the Dovecot mailing list which has historically been a 
very tolerant and forgiving place to learn?  Do you mind if I continue 
to remain sheltered?




You're overreacting.  Saying I'm not your personal XFS tutor is not being
hostile.  Heh, if you think that was hostile, go live on NANAE for a few days or
a week and report back on what real hostility is. ;)


I for one don't want the tone of this list to deteriorate to NANAE 
levels


There are plenty of lists and forums where you can get sarcastic answers 
from folks with more experience than ones self. Please lets try and keep 
the tone of this list as the friendly, helpful place it has been?


To offer just an *opinion*, being sarcastic (or just less than fully 
helpful) to idiots who can't be bothered to learn the basics before 
posting is rarely beneficial. Many simply leave and go elsewhere. Some 
do the spadework and become experienced, but in turn they usually 
respond in the same sharp way to new inexperienced questions... The 
circle continues...


I find it helpful to always presume there is a reason I should respect 
the poster, despite what might look like a lazy question to me.  Does 
someone with 10 years of experience in their of field deserve me to be 
sharp with them because they tried to skip a step and ask a lazy 
question without doing their own leg work?  Only yesterday I was that 
dimwit having spent 5 hours applying the wrong patch to a kernel and 
wondering why it failed to build, until I finally asked their list and 
got a polite reply pointing out my very trivial mistake...


Lets assume everyone deserves some respect and take the time to answer 
the dim questions politely?


Oh well, pleading over. Good luck and genuinely thanks to Stan for 
spending his valuable time here. Here's hoping you will continue to do 
so, but also being nice to the dummies?


Regards

Ed W

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-20 Thread Stan Hoeppner

Ed W put forth on 1/20/2011 6:54 AM:

 Oh well, pleading over. Good luck and genuinely thanks to Stan for spending 
 his
 valuable time here. Here's hoping you will continue to do so, but also being
 nice to the dummies?

Dummies isn't what this was about.  Again, I misread the intent of your
question as being troll bait against XFS.  That's why I responded with a blunt,
short reply.  I misread you, you misread me, now we're all one big happy family.
 Right?  :)

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-20 Thread Frank Cusack


On 1/20/11 12:06 AM -0600 Stan Hoeppner wrote:

 This is amusing considering XFS is hands down
the best filesystem available on any platform, including ZFS.  Others are
simply ignorant and repeat what they've heard without looking for current
information.


Not to be overly brusque, but that's a laugh.  The two best filesystems
out there today are vxfs and zfs, for almost any enterprise workload that
exists.  I won't argue that xfs won't stand out for specific workloads
such as sequential write, it might and I don't know quite enough about
it to be sure, but for general workloads including a mail store zfs is
leaps ahead.  I'd include WAFL in the top 3 but it's only accessible
via NFS.  Well there is a SAN version but it doesn't really give you
access to the best of the filesystem feature set (tradeoff for other
features of the hardware).

Your pronouncement that others are simply ignorant is telling.


Your data isn't safe until it hits the disk.  There are plenty of ways
to spool data to ram rather than committing it, but they are all
vulnerable to data loss until the data is written to disk.


The delayed logging code isn't a ram spooler, although that is a mild
side effect.  Apparently I didn't explain it fully, or precisely.  And
keep in mind, I'm not the dev who wrote the code.  So I'm merely
repeating my recollection of the description from the architectural
document and what was stated on the XFS list by the author, Dave Chinner
of Red Hat.

...

In my desire to be brief I didn't fully/correctly explain how delayed
logging works.  I attempted a simplified explanation that I thought most
would understand.  Here is the design document:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html


I guess I understand your championing of it if you consider that a
design document.  That brief piece of email hardly describes it at
all, and the performance numbers are pretty worthless (due to the
caveat that barriers are disabled).

Given the paragraph in the design document:


The best IO behaviour comes from the delayed logging version of XFS,
with the lowest bandwidth and iops to sustain the highest
performance. All the IO is to the log - no metadata is written to
disk at all, which is the way this test should execute.  As a result,
the delayed logging code was the only configuration not limited by
the IO subsystem - instead it was completely CPU bound (8 CPUs
worth)...


it is indeed a ram spooler, for metadata, which is a standard (and
good) approach.  That's not a side effect, that's the design.  AFAICT
from the brief description anyway.

This is guaranteed to lose data on power loss or drive failure.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-20 Thread Stan Hoeppner

Frank Cusack put forth on 1/20/2011 2:30 PM:
On 1/20/11 12:06 AM -0600 Stan Hoeppner wrote:
This is amusing considering XFS is hands down
the best filesystem available on any platform, including ZFS. Others are
simply ignorant and repeat what they've heard without looking for current
information.

Your pronouncement that others are simply ignorant is telling.

So is your intentionally quoting me out of context. In context:

Me:
Prior to 2007 there was a bug in XFS that caused filesystem corruption upon
power loss under some circumstances--actual FS corruption, not simply zeroing of
files that hadn't been fully committed to disk. Many (uneducated) folk in the
Linux world still to this day tell others to NOT use XFS because Power loss
will always corrupt your file system. Some probably know better but are EXT or
JFS (or god forbid, BTRFS) fans and spread fud regarding XFS. This is amusing
considering XFS is hands down the best filesystem available on any platform,
including ZFS. Others are simply ignorant and repeat what they've heard without
looking for current information.

The ignorant are those who blindly accept the false words of others regarding
4+ year old XFS corruption on power fail as being true today. They accept but
without verification. Hence the rumor persists in many places.

In my desire to be brief I didn't fully/correctly explain how delayed
logging works. I attempted a simplified explanation that I thought most
would understand. Here is the design document:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

I guess I understand your championing of it if you consider that a
design document. That brief piece of email hardly describes it at
all, and the performance numbers are pretty worthless (due to the
caveat that barriers are disabled).

You quoted me out of context again, intentionally leaving out the double paste
error I made of the same URL.

Me:
In my desire to be brief I didn't fully/correctly explain how delayed logging
works. I attempted a simplified explanation that I thought most would
understand. Here is the design document:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

Early performance numbers:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html;

Note the double URL paste error? Frank? Why did you twist an honest mistake
into something it's not? Here's the correct link:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs-delayed-logging-design.txt

Given the paragraph in the design document:

Stop being an ass. Or get off yours and Google instead of requiring me to spoon
feed you.

The best IO behaviour comes from the delayed logging version of XFS,
with the lowest bandwidth and iops to sustain the highest
performance. All the IO is to the log - no metadata is written to
disk at all, which is the way this test should execute. As a result,
the delayed logging code was the only configuration not limited by
the IO subsystem - instead it was completely CPU bound (8 CPUs
worth)...

it is indeed a ram spooler, for metadata, which is a standard (and
good) approach. That's not a side effect, that's the design. AFAICT
from the brief description anyway.

As you'll see in the design doc, that's not the intention of the patch. XFS
already had a delayed metadata update design, but it was terribly inefficient in
implementation. Dave increased the efficiency several fold. The reason I
mentioned it on Dovecot is that it directly applies to large/busy maildir style
mail stores.

XFS just clobbers all other filesystems in parallel workload performance, but
historically its metadata performance was pretty anemic, about half that of
other FSes. Thus, parallel creates and deletes of large numbers of small files
were horrible. This patch fixes that issue, and brings the metadata performance
of XFS up to the level of EXT3/4, Reiser, and others, for single process/thread
workloads, and far surpasses their performance with large parallel
process/thread workloads, as is shown in the email I linked.

This now makes XFS the perfect Linux FS for maildir and [s/m]dbox on moderate to
heavy load IMAP servers. Actually it's now the perfect filesystem for all Linux
server workloads. Previously it was for all workloads but metadata heavy ones.

This is guaranteed to lose data on power loss or drive failure.

On power loss, on a busy system, yes. Due to a single drive failure? That's
totally incorrect. How are you coming to that conclusion?

As with with every modern Linux filesystem that uses the kernel buffer cache,
which is, all of them, you will lose in flight data that's in the buffer cache
when power drops.

Performance always has a trade off. The key here is that the filesystem isn't
corrupted due to this metadata loss. Solaris with ZFS has the same issues. One
can't pipeline anything in a block device queue and not have some data loss on
power failure, period. If one syncs

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-19 Thread Stan Hoeppner

Ed W put forth on 1/17/2011 12:23 PM:
 On 17/01/2011 02:20, Stan Hoeppner wrote:
 Ed W put forth on 1/16/2011 4:11 PM:
 Using XFS with delayed logging mount option (requires kernel 2.6.36 or 
 later).

 XFS has natively used delayed allocation for quite some time, coalescing
 multiple pending writes before pushing them into the buffer cache.  This 
 not
 only decreases physical IOPS, but it also decreases filesystem 
 fragmentation by
 packing more files into each extent.  Decreased fragmentation means fewer 
 disk
 seeks required per file read, which also decreases physical IOPS.  This 
 also
 greatly reduces the wasted space typical of small file storage.  Works very
 well
 with maildir, but also with the other mail storage formats.
 What happens if you pull out the wrong cable in the rack, kernel 
 lockup/oops,
 power failure, hot swap disk pulled, or something else which causes an
 unexpected loss of a few seconds of written data?
 Read the XFS FAQ.  These questions have been answered hundreds of times since
 XFS was released in Irix in 1994.  I'm not your personal XFS tutor.
 
 Why the hostile reply?

If you think the above is hostile you have lived a privileged and sheltered
life, and I envy you. :)  That isn't hostile but a combination of losing
patience and being blunt.  Hostile is f--k you!.  Obviously I wasn't being
hostile.

 The question was deeper than your response?

Do you want to troll or learn something?

Prior to 2007 there was a bug in XFS that caused filesystem corruption upon
power loss under some circumstances--actual FS corruption, not simply zeroing of
files that hadn't been fully committed to disk.  Many (uneducated) folk in the
Linux world still to this day tell others to NOT use XFS because Power loss
will always corrupt your file system.  Some probably know better but are EXT or
JFS (or god forbid, BTRFS) fans and spread fud regarding XFS.  This is amusing
considering XFS is hands down the best filesystem available on any platform,
including ZFS.  Others are simply ignorant and repeat what they've heard without
looking for current information.

Thus, when you asked the question the way you did, you appeared to be trolling,
just like the aforementioned souls who do the same.  So I directed you to the
XFS FAQ where all of the facts are presented and all of your questions would be
answered, from the authoritative source, instead of wasting my time on a troll.

 Surely your IOPs are hard limited by the number of fsyncs (and size of any
 battery backed ram)?
 Depends on how your applications are written and how often they call fsync.  
 Do
 you mean BBWC?  WRT delayed logging BBWC is mostly irrelevant.  Keep in mind
 that for delayed logging to have a lot of metadata writes in memory someone, 
 or
 many someones, must be doing something like an 'rm -rf' or equivalent on a 
 large
 dir with many thousands of files.  Even in this case, the processing is 
 _very_
 fast.
 
 You have completely missed my point.

No, I haven't.

 Your data isn't safe until it hits the disk.  There are plenty of ways to 
 spool
 data to ram rather than committing it, but they are all vulnerable to data 
 loss
 until the data is written to disk.

The delayed logging code isn't a ram spooler, although that is a mild side
effect.  Apparently I didn't explain it fully, or precisely.  And keep in mind,
I'm not the dev who wrote the code.  So I'm merely repeating my recollection of
the description from the architectural document and what was stated on the XFS
list by the author, Dave Chinner of Red Hat.

 You wrote: filesystem metadata write operations are pushed almost entirely 
 into
 RAM, but if the application requests an fsync then you still have to write it
 to disk?  As such you are again limited by disk IO, which itself is limited by
 the performance of the device (and temporarily accelerated by any persistent
 write cache).  Hence my point that your IOPs are generally limited by the 
 number
 of fsyncs and any persistent write cache?

In my desire to be brief I didn't fully/correctly explain how delayed logging
works.  I attempted a simplified explanation that I thought most would
understand.  Here is the design document:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

Early performance numbers:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

 As I write this email I'm struggling with getting a server running again that
 has just been rudely powered down due to a UPS failing (power was fine, UPS
 failed...).  This isn't such a rare event (IMHO) and hence I think we do need 
 to
 assume that at some point every machine will suffer a rude and unexpected 
 event
 which looses all in progress write cache.  I have no complaints at XFS in
 general, but I think it's important that filesystem designers in general have
 give some thought to this event and recovering from it?

Rest assured this is a top priority.  Ever heard of SGI by chance?  They sell
supercomputers with 1024 CPUs, 16 terabytes of

[Dovecot] SSD drives are really fast running Dovecot

2011-01-18 Thread Warren Baker

On Monday, January 17, 2011, Stan Hoeppner s...@hardwarefreak.com wrote:
 Cor Bosman put forth on 1/16/2011 5:34 PM:
 Btw, our average mailsize last we checked was 30KB. Thats a pretty good 
 average as we're an ISP with a very wide user base.  I think 4KB average is 
 not a normal mail load.

 As another OP pointed out, some ISPs apparently have to deliver a lot of spam 
 to
 mailboxen to avoid FPs, bumping up that average mail size considerably.  Do 
 you
 accept and deliver a lot of spam to user mailboxen?


At an ISP I worked at, we did a study (just over 2 years ago) on the
average size of spam mail that was been delivered to the users. It
worked out to an average size of between 8KB and 10KB. This was based
on data over a period of 12months with an average of a 180 million
mails per month been received. Legit mail was averaging at 32KB.

I doubt whether the sizes of spam has changed much.

.warren


-- 
.warren

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-18 Thread Stan Hoeppner

Warren Baker put forth on 1/18/2011 2:53 AM:
 On Monday, January 17, 2011, Stan Hoeppner s...@hardwarefreak.com wrote:
 Cor Bosman put forth on 1/16/2011 5:34 PM:
 Btw, our average mailsize last we checked was 30KB. Thats a pretty good 
 average as we're an ISP with a very wide user base.  I think 4KB average is 
 not a normal mail load.

 As another OP pointed out, some ISPs apparently have to deliver a lot of 
 spam to
 mailboxen to avoid FPs, bumping up that average mail size considerably.  Do 
 you
 accept and deliver a lot of spam to user mailboxen?
 
 
 At an ISP I worked at, we did a study (just over 2 years ago) on the
 average size of spam mail that was been delivered to the users. It
 worked out to an average size of between 8KB and 10KB. This was based
 on data over a period of 12months with an average of a 180 million
 mails per month been received. Legit mail was averaging at 32KB.
 
 I doubt whether the sizes of spam has changed much.

What was the ratio of spam to ham you were delivering to user mailboxes?

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-18 Thread Warren Baker

On Tue, Jan 18, 2011 at 11:44 AM, Stan Hoeppner s...@hardwarefreak.com wrote:
 At an ISP I worked at, we did a study (just over 2 years ago) on the
 average size of spam mail that was been delivered to the users. It
 worked out to an average size of between 8KB and 10KB. This was based
 on data over a period of 12months with an average of a 180 million
 mails per month been received. Legit mail was averaging at 32KB.

 I doubt whether the sizes of spam has changed much.

 What was the ratio of spam to ham you were delivering to user mailboxes?


If i remember correctly it was around 85/15.


-- 
.warren

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Steve


 Original-Nachricht 
 Datum: Sun, 16 Jan 2011 20:33:23 -0600
 Von: Stan Hoeppner s...@hardwarefreak.com
 An: dovecot@dovecot.org
 Betreff: Re: [Dovecot] SSD drives are really fast running Dovecot

 Cor Bosman put forth on 1/16/2011 5:34 PM:
  Btw, our average mailsize last we checked was 30KB. Thats a pretty good
 average as we're an ISP with a very wide user base.  I think 4KB average is
 not a normal mail load.
 
 As another OP pointed out, some ISPs apparently have to deliver a lot of
 spam to
 mailboxen to avoid FPs, bumping up that average mail size considerably.
 
Spam does not bump the average mail size considerably. Average spam mails is 
way smaller then average normal mails. The reason for this is very simple: 
Spammers need to reach as many end users as possible. And they need to get 
those mails out as fastest as possible.


 Do you
 accept and deliver a lot of spam to user mailboxen?
 
 -- 
 Stan

-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Joseph Tam


On Sat, 15 Jan 2011, Charles Marcus wrote


Doing this will also help train users in proper email management -
treating their INBOX just like they would a physical INBOX tray on their
desk. They wouldn't just let paper pile up there, why do so in their
INBOX (because they 'can')? Ie, it should be something they should
always strive to keep totally EMPTY. Of course this practically never
happens, but the point is, they need to learn to make a decision once
they are finished with it, and most importantly, take said action -
either delete it, or file it.


This is pretty much what I do with the mail domain I administer.  I've
set the INBOX with a modest quota, and a personal mail folder with a
generous quota.  I encourage users to keep their INBOX as a working set,
and archive the rest using any method they prefer.

It forces users to process their Email (or at least their INBOX).
and keeps packratting in check.  Super-big INBOX quotas seem to encourage
wasteful habits.  I've helped some users clean out their mailboxes and
was surpised at the amount of junk being kept for years and years.

Apart from a few users moaning about their meager INBOX, this policy
works out fairly well.

Joseph Tam jtam.h...@gmail.com

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Maarten Bezemer



On Mon, 17 Jan 2011, Steve wrote:

Spam does not bump the average mail size considerably. Average spam 
mails is way smaller then average normal mails. The reason for this is 
very simple: Spammers need to reach as many end users as possible. And 
they need to get those mails out as fastest as possible.


Somewhat correct. Due to a lot of spamfilter setups skip messages above a 
certain size, we've seen an increase of such big messages. These affect 
the average quite severely.


An average, however, is only just that: an average. There may not even be 
1 message that has exactly the average size...


When looking at last two weeks worth of spam that didn't come from obvious 
blacklisted sources, I see:
45 messages below 4KB (including quite some miserable failures that forgot 
to include a message body...)

127 messages above 8KB, of which only 14 above 20KB
940 messages between 4KB and 8KB

Yet, the _average_ was well above 8KB, due to a few 500KB+ messages.

So, mean, median, or whatever, it's just lies, damn lies, and statistics.


--
Maarten

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Steve


 Original-Nachricht 
 Datum: Mon, 17 Jan 2011 11:13:19 +0100 (CET)
 Von: Maarten Bezemer mcbdove...@robuust.nl
 An: Dovecot Mailing List dovecot@dovecot.org
 Betreff: Re: [Dovecot] SSD drives are really fast running Dovecot

 
 On Mon, 17 Jan 2011, Steve wrote:
 
  Spam does not bump the average mail size considerably. Average spam 
  mails is way smaller then average normal mails. The reason for this is 
  very simple: Spammers need to reach as many end users as possible. And 
  they need to get those mails out as fastest as possible.
 
 Somewhat correct. Due to a lot of spamfilter setups skip messages above a 
 certain size, we've seen an increase of such big messages. These affect 
 the average quite severely.
 
 An average, however, is only just that: an average. There may not even be 
 1 message that has exactly the average size...
 
 When looking at last two weeks worth of spam that didn't come from obvious
 blacklisted sources, I see:
 45 messages below 4KB (including quite some miserable failures that forgot
 to include a message body...)
 127 messages above 8KB, of which only 14 above 20KB
 940 messages between 4KB and 8KB
 
 Yet, the _average_ was well above 8KB, due to a few 500KB+ messages.
 
You get 500Kb+ sized spam messages? That is not usual. I have not done any 
computation on my part but I remember seen last year (or so) a study showing 
that spam messages are usually below 64Kb.

Anyway... Why is this so ultra important how big spam messages are?


 So, mean, median, or whatever, it's just lies, damn lies, and statistics.
 
 
 -- 
 Maarten

-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!   
Jetzt informieren: http://www.gmx.net/de/go/freephone

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Giles Coochey


On 17/01/2011 13:41, Steve wrote:

You get 500Kb+ sized spam messages? That is not usual. I have not done any 
computation on my part but I remember seen last year (or so) a study showing 
that spam messages are usually below 64Kb.

That can depend on what you clasify as SPAM. Many, 'newsletters' which 
you've been 'subscribed to' by negative option web-forms are considered 
SPAM by some, and those may contain PDF attachments of 500kb+


--
Best Regards,

Giles Coochey
NetSecSpec Ltd
NL T-Systems Mobile: +31 681 265 086
NL Mobile: +31 626 508 131
Gib Mobile: +350 5401 6693
Email/MSN/Live Messenger: gi...@coochey.net
Skype: gilescoochey





smime.p7s
Description: S/MIME Cryptographic Signature

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Charles Marcus

On 2011-01-17 4:49 AM, Joseph Tam wrote:
 It forces users to process their Email (or at least their INBOX).
 and keeps packratting in check.  Super-big INBOX quotas seem to encourage
 wasteful habits.  I've helped some users clean out their mailboxes and
 was surpised at the amount of junk being kept for years and years.
 
 Apart from a few users moaning about their meager INBOX, this policy
 works out fairly well.

Then you have the power users who simply write a message filter to
immediately move all incoming mail to a personal folder, thus bypassing
this 'feature'...

Packrats it seems always find a way... ;)

-- 

Best regards,

Charles

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Steve


 Original-Nachricht 
 Datum: Mon, 17 Jan 2011 13:45:51 +0100
 Von: Giles Coochey gi...@coochey.net
 An: dovecot@dovecot.org
 Betreff: Re: [Dovecot] SSD drives are really fast running Dovecot

 On 17/01/2011 13:41, Steve wrote:
  You get 500Kb+ sized spam messages? That is not usual. I have not done
 any computation on my part but I remember seen last year (or so) a study
 showing that spam messages are usually below 64Kb.
 
 That can depend on what you clasify as SPAM. Many, 'newsletters' which 
 you've been 'subscribed to' by negative option web-forms are considered 
 SPAM by some, and those may contain PDF attachments of 500kb+
 
Welll I wrote about usual and those newsletter that you tag as Spam but 
have subscribed to them are definitely not the norm.


 -- 
 Best Regards,
 
 Giles Coochey
 NetSecSpec Ltd
 NL T-Systems Mobile: +31 681 265 086
 NL Mobile: +31 626 508 131
 Gib Mobile: +350 5401 6693
 Email/MSN/Live Messenger: gi...@coochey.net
 Skype: gilescoochey
 
 
 

-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Giles Coochey


On 17/01/2011 14:18, Steve wrote:



On 17/01/2011 13:41, Steve wrote:

You get 500Kb+ sized spam messages? That is not usual. I have not done

any computation on my part but I remember seen last year (or so) a study
showing that spam messages are usually below 64Kb.
That can depend on what you clasify as SPAM. Many, 'newsletters' which
you've been 'subscribed to' by negative option web-forms are considered
SPAM by some, and those may contain PDF attachments of 500kb+


Welll I wrote about usual and those newsletter that you tag as Spam but 
have subscribed to them are definitely not the norm.


I think that was Martin's point, while they are not the norm, they are off the 
scale enough to severely influence the 'average'. They account for 0.2% of the 
events in the same, yet are more than 1% above what is considered the 
'norm'.

--
Best Regards,

Giles Coochey
NetSecSpec Ltd
NL T-Systems Mobile: +31 681 265 086
NL Mobile: +31 626 508 131
Gib Mobile: +350 5401 6693
Email/MSN/Live Messenger: gi...@coochey.net
Skype: gilescoochey





smime.p7s
Description: S/MIME Cryptographic Signature

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Maarten Bezemer



On Mon, 17 Jan 2011, Steve wrote:


Von: Giles Coochey gi...@coochey.net

That can depend on what you clasify as SPAM. Many, 'newsletters' which
you've been 'subscribed to' by negative option web-forms are considered
SPAM by some, and those may contain PDF attachments of 500kb+


Welll I wrote about usual and those newsletter that you tag as Spam but 
have subscribed to them are definitely not the norm.


I didn't count newsletters I subscribed to.. Always using traceable 
addresses for those. In this case, it was JPG spam with large pics. Some 
claiming to be LED lighting newsletter, others disguised as new year's 
greetings. But content showed something not quite related to LEDs or happy 
new year stuff :-P
All these big spams are addressed to bogus addresses, and/or standard 
addresses like info@domain. Usually with info@some-other-domain in the 
From: header.



But these are my last 2 cents for this thread as it has been derailing for 
quite some time now ;-)



--
Maarten

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Charles Marcus

On 2011-01-15 9:30 AM, Charles Marcus wrote:
 Then, enforce a smallish per user quota (how much would depend on your
 particular environment, but I'm thinking something like 250 or maybe
 500MB, since our users do get a lot of large attachments in the course
 of doing business) on their INBOX -  Sent, Drafts and Templates folders
 too, but that's a question on my list of 'how to do' - how to easily
 place these 'special' folders on the 'fast' namespace, and all user
 created folders in the 'slow' namespace. It would be really nice if
 there were some kind of native way that dovecot could 'assign' the
 'special' folders to the same namespace as the INBOX, and all other user
 created folders to another...

Timo - any chance you could comment on the best way to accomplish this -
or if it is even possible right now? I'm hoping to start testing this in
the next few weeks...

Thanks,

-- 

Best regards,

Charles

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Ed W


On 17/01/2011 02:20, Stan Hoeppner wrote:

Ed W put forth on 1/16/2011 4:11 PM:

Using XFS with delayed logging mount option (requires kernel 2.6.36 or later).

XFS has natively used delayed allocation for quite some time, coalescing
multiple pending writes before pushing them into the buffer cache.  This not
only decreases physical IOPS, but it also decreases filesystem fragmentation by
packing more files into each extent.  Decreased fragmentation means fewer disk
seeks required per file read, which also decreases physical IOPS.  This also
greatly reduces the wasted space typical of small file storage.  Works very well
with maildir, but also with the other mail storage formats.

What happens if you pull out the wrong cable in the rack, kernel lockup/oops,
power failure, hot swap disk pulled, or something else which causes an
unexpected loss of a few seconds of written data?

Read the XFS FAQ.  These questions have been answered hundreds of times since
XFS was released in Irix in 1994.  I'm not your personal XFS tutor.


Why the hostile reply?

The question was deeper than your response?



Surely your IOPs are hard limited by the number of fsyncs (and size of any
battery backed ram)?

Depends on how your applications are written and how often they call fsync.  Do
you mean BBWC?  WRT delayed logging BBWC is mostly irrelevant.  Keep in mind
that for delayed logging to have a lot of metadata writes in memory someone, or
many someones, must be doing something like an 'rm -rf' or equivalent on a large
dir with many thousands of files.  Even in this case, the processing is _very_ 
fast.


You have completely missed my point.

Your data isn't safe until it hits the disk.  There are plenty of ways 
to spool data to ram rather than committing it, but they are all 
vulnerable to data loss until the data is written to disk.


You wrote: filesystem metadata write operations are pushed almost 
entirely into RAM, but if the application requests an fsync then you 
still have to write it to disk?  As such you are again limited by disk 
IO, which itself is limited by the performance of the device (and 
temporarily accelerated by any persistent write cache).  Hence my point 
that your IOPs are generally limited by the number of fsyncs and any 
persistent write cache?


As I write this email I'm struggling with getting a server running again 
that has just been rudely powered down due to a UPS failing (power was 
fine, UPS failed...).  This isn't such a rare event (IMHO) and hence I 
think we do need to assume that at some point every machine will suffer 
a rude and unexpected event which looses all in progress write cache.  I 
have no complaints at XFS in general, but I think it's important that 
filesystem designers in general have give some thought to this event and 
recovering from it?



Please try not to be so hostile in your email construction - we aren't 
all idiots here, and even if we were, your writing style is not 
conducive to us wanting to learn from your apparent wealth of experience?


Regards

Ed W

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread David Woodhouse

On Thu, 2011-01-13 at 20:19 +0100, Steve wrote:
 
 I would not use MLC in a server environment. SLC has much better
 program/erase cycles per cell.

I wouldn't be overly worried about the underlying medium.

I'm more worried about the translation layer they use on top of it, to
make it pretend to be spinning rust. It is essentially a file system, on
top of which you are expected to layer another file system. Not
particularly efficient, but at least TRIM addresses one of the biggest
inefficiencies of that gratuitous extra layering.

The inefficiency is one thing, but it's the reliability that worries me.
It's generally accepted that it takes at least 5 years for a file system
implementation to truly reach maturity. And that's for open source code
that you can debug, on a medium that you can access directly to do
diagnosis and data recovery.

But what we're talking about here is a file system implemented inside a
black box where you can't do any of that. And what's more, they keep
changing it. Even if you manage to find some device that passes your
testing, you may find that the next batch of the *same* device (from
your point of view) actually contains completely different software
*and* hardware if you take it apart.

These translation layers are almost always a complete pile of crap.
Especially in the face of power failures, since they so often completely
fail to implement basic data integrity features (the same kind of
journalling features that also have to be implemented in the 'real' file
system on top of this fake disk).

The best way to use flash is to have a file system that's *designed* for
use on flash. The only problem with that is that it wouldn't work with
DOS; you can't provide an INT 13h DISK BIOS handler to use it...

-- 
dwmw2

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Timo Sirainen

On Mon, 2011-01-17 at 09:07 -0500, Charles Marcus wrote:
 On 2011-01-15 9:30 AM, Charles Marcus wrote:
  Then, enforce a smallish per user quota (how much would depend on your
  particular environment, but I'm thinking something like 250 or maybe
  500MB, since our users do get a lot of large attachments in the course
  of doing business) on their INBOX -  Sent, Drafts and Templates folders
  too, but that's a question on my list of 'how to do' - how to easily
  place these 'special' folders on the 'fast' namespace, and all user
  created folders in the 'slow' namespace. It would be really nice if
  there were some kind of native way that dovecot could 'assign' the
  'special' folders to the same namespace as the INBOX, and all other user
  created folders to another...
 
 Timo - any chance you could comment on the best way to accomplish this -
 or if it is even possible right now? I'm hoping to start testing this in
 the next few weeks...

Well, you can have per-namespace quotas. But that of course means that
the other namespace must have a prefix, so users would have to have
something like:

 - INBOX
 - Drafts
 - Sent
 - bignamespace
 - work
 - etc.

Which wouldn't be very pretty. So if you then wanted to arbitrarily move
mailboxes across different quota roots .. I'm not really even sure what
would be a good way to configure that.

For INBOX only a separate namespace should be possible to implement
without trouble though.


signature.asc
Description: This is a digitally signed message part

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Charles Marcus

On 2011-01-17 2:47 PM, Timo Sirainen wrote:
 On Mon, 2011-01-17 at 09:07 -0500, Charles Marcus wrote:
 On 2011-01-15 9:30 AM, Charles Marcus wrote:
 Then, enforce a smallish per user quota (how much would depend on your
 particular environment, but I'm thinking something like 250 or maybe
 500MB, since our users do get a lot of large attachments in the course
 of doing business) on their INBOX -  Sent, Drafts and Templates folders
 too, but that's a question on my list of 'how to do' - how to easily
 place these 'special' folders on the 'fast' namespace, and all user
 created folders in the 'slow' namespace. It would be really nice if
 there were some kind of native way that dovecot could 'assign' the
 'special' folders to the same namespace as the INBOX, and all other user
 created folders to another...

 Timo - any chance you could comment on the best way to accomplish this -
 or if it is even possible right now? I'm hoping to start testing this in
 the next few weeks...
 
 Well, you can have per-namespace quotas. But that of course means that
 the other namespace must have a prefix, so users would have to have
 something like:
 
  - INBOX
  - Drafts
  - Sent
  - bignamespace
  - work
  - etc.
 
 Which wouldn't be very pretty. So if you then wanted to arbitrarily move
 mailboxes across different quota roots .. I'm not really even sure what
 would be a good way to configure that.
 
 For INBOX only a separate namespace should be possible to implement
 without trouble though.

Ok, thanks... hmmm... have to do some more thinking on this one...

-- 

Best regards,

Charles

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread David Woodhouse

On Sat, 2011-01-15 at 10:41 +, Ed W wrote:
 
  One of the systems to fail was a firewall running off SSD.
 
 SSD or CF?

That doesn't make a lot of difference. They're all broadly similar.
There are better devices and worse devices, but they're mostly crap.

And as I said earlier, even if you think you've worked out which is
which, it may change from batch to batch of what is allegedly the *same*
product.

 It would appear it's also possible to damage some flash memory by 
 powering off at the wrong moment? 

Almost all of them will fail hard it you do any serious power-fail
testing on them. It's not a hardware failure; it's just that their
*internal* file system is corrupt and needs a fsck (or just wiping and
starting again). But since you can't *access* the underlying medium, all
you can do is cry and buy a new one.

The fun thing is that their internal garbage collection could be
triggered by a *read* from the host computer, or could even happen
purely due to a timeout of some kind. So there is *never* a time when
you know it's safe to power off because I haven't written to it for 5
minutes.

Yes, it's perfectly possible to design journalling file systems that
*are* resilient to power failure. But the file systems inside these
devices are generally written by the same crack-smoking hobos that write
PC BIOSes; you don't expect quality software here.

By putting a logic analyser on some of these devices to watch what
they're *actually* doing on the flash when they garbage-collect, we've
found some really nasty algorithms. When garbage-collecting, one of them
would read from the 'victim' eraseblock into RAM, then erase the victim
block while the data were still only held in RAM — so that a power
failure at that moment would lose it. And then, just to make sure its
race window was nice and wide, it would then pick a *second* victim
block and copy data from there into the freshly-erased block, before
erasing that second block and *finally* writing the data from RAM back
to it. It's just scary :)

-- 
dwmw2

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-17 Thread Frank Cusack


On 1/16/11 2:10 PM -0600 Stan Hoeppner wrote:

Using XFS with delayed logging mount option (requires kernel 2.6.36 or
later).

...

Using the delayed logging feature, filesystem metadata write operations
are pushed almost entirely into RAM.  Not only does this _dramatically_
decrease physical metadata write IOPS but it also increases metadata
write performance by an order of magnitude.  Really shines with maildir,


ext3 has the same feature.  It's fantastic for write performance,
and especially for NFS, but for obvious reasons horrible for reliability.
I'm sure XFS fixes the consistency problems of ext3, ie on power failure
your fs is still consistent, but clearly this strategy is not good
for a mail store, ie where you also care about not losing data.

I personally like ZFS with a SLOG for the sync writes.  Plus you get
the extra guarantees of zfs, ie it guarantees that your disk isn't
lying to you.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Robert Schetterer

Am 16.01.2011 06:39, schrieb Noel Butler:
 LOL this is just s funny., watching the no no no im right you're
 wrong, give up stanley, those on many lists are aware of your trolling,
 nobody cares about your lil SOHO world, this list contains many
 different sized orgs, and like someone else mentione,d the 4K email size
 is SO 1994, but, that about sums you up anyway.
 
 
 
 On Sat, 2011-01-15 at 23:19 -0600, Stan Hoeppner wrote:
 
 Philipp Haselwarter put forth on 1/15/2011 8:32 PM:

 ,
 | More than 97% of all e-mails sent over the net are unwanted, according
 | to a Microsoft security report.[39]
 | 
 | MAAWG estimates that 85% of incoming mail is abusive email, as of the
 | second half of 2007. The sample size for the MAAWG's study was over 100
 | million mailboxes.[40][41][42]
 | 
 | Spamhaus estimates that 90% of incoming e-mail traffic is spam in North
 | America, Europe or Australasia.[43] By June 2008 96.5% of e-mail
 | received by businesses was spam.[18][unreliable source?]
 `

 I just have a tiny set of 4k spam mails, but they have an avg size of
 39KB, ie well above 4KB.

 This discussion has been in the context of _storing_ user email.  The 
 assumption
 is that an OP is smart/talented enough to get his spam filters/appliances
 killing 99% before it reaches intermediate storage or mailboxes.  Thus, in 
 the
 context of this discussion, the average size of a spam message is irrelevant,
 because we're talking about what goes into the mail store.

 If you're storing significantly more than 1% of spam you need to get that 
 under
 control before doing any kind of meaningful analysis of mail storage needs.

 
 
 

the simple truth is SSD drives are fast, and it will/might be the future
but mail people are conservative ( you might think about the hell
loosing mail ), at present SSD drives tec is not ready for
serving big mail stores. ( my opinion )
tec time is turning fast, so in whatever time we might all use only
SSD drives then ,or not, we will see
after all its not really a dovecot theme wich should lead to endless
flames on this list


-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Cor Bosman

 This discussion has been in the context of _storing_ user email.  The 
 assumption
 is that an OP is smart/talented enough to get his spam filters/appliances
 killing 99% before it reaches intermediate storage or mailboxes.  Thus, in the
 context of this discussion, the average size of a spam message is irrelevant,
 because we're talking about what goes into the mail store.

The fact is, we all live in different realities, so we're all arguing about 
apples and oranges. If you're managing a SOHO, small company, large company, 
university, or in our case, an ISP, the requirements are all different.  We 
have about a million mailboxes, about 20K active at the same time, and people 
pay for it. 

Take for example Stan's spam quote above. In the real world of an ISP, killing 
99% of all spam before it hits the storage is unthinkable. We only block spam 
that is guaranteed to be unwanted, mostly based on technical facts that can't 
ever happen in normal email. But email that our scanning system flags as 
probable spam, is just that, probable spam. We can not just throw that away, 
because in the real world, there are always, and I mean always, false 
positives. It is unthinkable to throw false positives away. So we have to put 
these emails in a spam folder in case the user wants to look at it.  We block 
about 40% of all spam on technical grounds, our total spam percentage is 90%, 
so still about  80% of all customer email reaching the storage is spam. 

But in other environments throwing away all probable spam may be perfectly 
fine. For my SOHO id have no problem throwing probable spam away. I never look 
in my spam folder anyways, so cant be missing much. 

The same goes for SSD. We use SSD drives extensively in our company. Currently 
mostly in database servers, but our experiences have been good enough that 
we're slowly starting to add them to more systems as even boot drives. But 
we're not using them yet in email storage. Like Brad we're using Netapp filers 
because as far as I know they're one of the few commercially available HA 
filesystem companies.  We've looked at EMC and Sun as well, but havent found a 
reason to move away from Netapp. In 12 years of netapp we've only had 1 major 
outage that lasted half a day (and made the front page of national news 
papers).  So, understand that bit. Major outages make it to national news 
papers for us. HA, failover, etc are kind of important to us.  

So why not build something ourselves and use SSD? I suppose we could, but it's 
not as easy as it sounds for us. (your mileage may vary).  It would take 
significant amounts of engineering time, testing, migrating, etc etc.  And the 
benefits are uncertain. We dont know if an open source HA alternative can give 
us another 12 years of virtually faultless operation. It may. It may not. Email 
is not something to start gambling with. People get kind of upset when their 
email disappears. We know what we've got with Netapp. 

I did dabble in using SSD for indexes for a while, and it looked very 
promising. Certainly indexes are a prime target for SSD drives.  But when the 
director matured, we started using the director and the netapp for indexes 
again.  I may still build my own NFS server and use SSD drives just for 
indexes, simply to offload IOPS from the Netapp. Indexes are a little less 
scary to experiment with. 

So, if you're in the position to try out SSD drives for indexes or even for 
storage, go for it. Im sure it will perform much better than spinning drives. 

Cor

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Javier de Miguel Rodríguez


El 13/01/11 17:01, David Woodhouse escribió:

On Wed, 2011-01-12 at 09:53 -0800, Marc Perkel wrote:

I just replaced my drives for Dovecot using Maildir format with a pair
of Solid State Drives (SSD) in a raid 0 configuration. It's really
really fast. Kind of expensive but it's like getting 20x the speed for
20x the price. I think the big gain is in the 0 seek time.

You may find ramfs is even faster :)
ramfs (tmpfs in linux-land) is useful for indexes. If you lose the 
indexes, they will created automatically the next time a user logs in.


We are now trying zlib plugin to lower the number of iops to our 
maildir storage systems. We are using gzip (bzip2 increases a lot the 
latency). LZMA/xz seems interesting (high compression and rather good 
decompression speed) and lzo also seems interesting (blazing fast 
compression AND decompression, not much compression savings though)


What kind of tricks do you use to lower the number of IOPs of 
your dovecot servers?


Regards

Javier





I hope you have backups.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Timo Sirainen

On Sun, 2011-01-16 at 00:05 -0600, Stan Hoeppner wrote:
 Using O_DIRECT with mbox files, the IOPS
 performance can be even greater.  However, I don't know if this applies to
 Dovecot because AFAIK MMAP doesn't work well with O_DIRECT...
 ***Hay Timo, does/can Dovecot use Linux O_DIRECT for writing the mail files?

mmap doesn't matter, because mbox files aren't read with mmap. But I
doubt it's a good idea to use O_DIRECT for mbox files, because even if
it gives higher iops, you're using more iops because you keep re-reading
the same data from disk since it's not cached to memory.

As for O_DIRECT writes.. I don't know if it's such a good idea either.
If client is connected, it's often going to read the mail soon after it
was written, so it's again a good idea that it stays in cache.

I once wrote a patch to free message contents from OS cache once the
message was read entirely, because it probably wouldn't be read again.
No one ever reported if it gave any better or worse performance.
http://dovecot.org/patches/1.1/fadvise.diff



signature.asc
Description: This is a digitally signed message part

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Stan Hoeppner

Javier de Miguel Rodríguez put forth on 1/16/2011 12:00 PM:

 What kind of tricks do you use to lower the number of IOPs of your 
 dovecot
 servers?

Using hardware SAN RAID controllers with 'large' (2GB) write cache.  The large
write cache allows for efficient use of large queue depths.  A deeper queue
allows the drives to order reads/writes most efficiently decreasing head seek
movement.  This doesn't necessarily decrease IO per se, but it makes the drives
more efficient, allowing for more total physical drive IOPS.

Using XFS with delayed logging mount option (requires kernel 2.6.36 or later).

XFS has natively used delayed allocation for quite some time, coalescing
multiple pending writes before pushing them into the buffer cache.  This not
only decreases physical IOPS, but it also decreases filesystem fragmentation by
packing more files into each extent.  Decreased fragmentation means fewer disk
seeks required per file read, which also decreases physical IOPS.  This also
greatly reduces the wasted space typical of small file storage.  Works very well
with maildir, but also with the other mail storage formats.

Using the delayed logging feature, filesystem metadata write operations are
pushed almost entirely into RAM.  Not only does this _dramatically_ decrease
physical metadata write IOPS but it also increases metadata write performance by
an order of magnitude.  Really shines with maildir, obviously, but would also
help the s/mdbox formats since they make use of multiple files.  Delaylog
doesn't help mbox at all, and it doesn't do anything for index file performance.
 The caveat here is _load_.  You won't get much benefit on a mostly idle server.
 The benefits of delayed logging increase as the filesystem metadata write load
increases.  Busy servers benefit the most.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Stan Hoeppner

Timo Sirainen put forth on 1/16/2011 12:48 PM:
 On Sun, 2011-01-16 at 00:05 -0600, Stan Hoeppner wrote:
 Using O_DIRECT with mbox files, the IOPS
 performance can be even greater.  However, I don't know if this applies to
 Dovecot because AFAIK MMAP doesn't work well with O_DIRECT...
 ***Hay Timo, does/can Dovecot use Linux O_DIRECT for writing the mail files?
 
 mmap doesn't matter, because mbox files aren't read with mmap. But I
 doubt it's a good idea to use O_DIRECT for mbox files, because even if
 it gives higher iops, you're using more iops because you keep re-reading
 the same data from disk since it's not cached to memory.
 
 As for O_DIRECT writes.. I don't know if it's such a good idea either.
 If client is connected, it's often going to read the mail soon after it
 was written, so it's again a good idea that it stays in cache.
 
 I once wrote a patch to free message contents from OS cache once the
 message was read entirely, because it probably wouldn't be read again.
 No one ever reported if it gave any better or worse performance.
 http://dovecot.org/patches/1.1/fadvise.diff

I'd gladly test it but I don't have the resources currently, and frankly, at
this time, the prerequisite knowledge of building from source.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Ed W




Using XFS with delayed logging mount option (requires kernel 2.6.36 or later).

XFS has natively used delayed allocation for quite some time, coalescing
multiple pending writes before pushing them into the buffer cache.  This not
only decreases physical IOPS, but it also decreases filesystem fragmentation by
packing more files into each extent.  Decreased fragmentation means fewer disk
seeks required per file read, which also decreases physical IOPS.  This also
greatly reduces the wasted space typical of small file storage.  Works very well
with maildir, but also with the other mail storage formats.


What happens if you pull out the wrong cable in the rack, kernel 
lockup/oops, power failure, hot swap disk pulled, or something else 
which causes an unexpected loss of a few seconds of written data?


Surely your IOPs are hard limited by the number of fsyncs (and size of 
any battery backed ram)?


Ed W

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Cor Bosman

Btw, our average mailsize last we checked was 30KB. Thats a pretty good average 
as we're an ISP with a very wide user base.  I think 4KB average is not a 
normal mail load.

Cor

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Stan Hoeppner

Ed W put forth on 1/16/2011 4:11 PM:
 
 Using XFS with delayed logging mount option (requires kernel 2.6.36 or 
 later).

 XFS has natively used delayed allocation for quite some time, coalescing
 multiple pending writes before pushing them into the buffer cache.  This not
 only decreases physical IOPS, but it also decreases filesystem fragmentation 
 by
 packing more files into each extent.  Decreased fragmentation means fewer 
 disk
 seeks required per file read, which also decreases physical IOPS.  This also
 greatly reduces the wasted space typical of small file storage.  Works very 
 well
 with maildir, but also with the other mail storage formats.
 
 What happens if you pull out the wrong cable in the rack, kernel lockup/oops,
 power failure, hot swap disk pulled, or something else which causes an
 unexpected loss of a few seconds of written data?

Read the XFS FAQ.  These questions have been answered hundreds of times since
XFS was released in Irix in 1994.  I'm not your personal XFS tutor.

 Surely your IOPs are hard limited by the number of fsyncs (and size of any
 battery backed ram)?

Depends on how your applications are written and how often they call fsync.  Do
you mean BBWC?  WRT delayed logging BBWC is mostly irrelevant.  Keep in mind
that for delayed logging to have a lot of metadata writes in memory someone, or
many someones, must be doing something like an 'rm -rf' or equivalent on a large
dir with many thousands of files.  Even in this case, the processing is _very_ 
fast.

If your assumption is that your system is unstable, or you assume you will do
stupid things to break your system, then don't use a high performance
filesystem.  This behavior is not limited to XFS.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Stan Hoeppner

Cor Bosman put forth on 1/16/2011 5:34 PM:
 Btw, our average mailsize last we checked was 30KB. Thats a pretty good 
 average as we're an ISP with a very wide user base.  I think 4KB average is 
 not a normal mail load.

As another OP pointed out, some ISPs apparently have to deliver a lot of spam to
mailboxen to avoid FPs, bumping up that average mail size considerably.  Do you
accept and deliver a lot of spam to user mailboxen?

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Noel Butler

On Sun, 2011-01-16 at 20:33 -0600, Stan Hoeppner wrote:

 Cor Bosman put forth on 1/16/2011 5:34 PM:
  Btw, our average mailsize last we checked was 30KB. Thats a pretty good 
  average as we're an ISP with a very wide user base.  I think 4KB average is 
  not a normal mail load.
 
 As another OP pointed out, some ISPs apparently have to deliver a lot of spam 
 to
 mailboxen to avoid FPs, bumping up that average mail size considerably.  Do 
 you
 accept and deliver a lot of spam to user mailboxen?
 


Still assuming that 30K average is spam huh stanley, accept the fact you
screwed up, your data is 1994, in fact I question its accuracy for even
back then. 

keep peddling, its funny to watch, thankfully its in the archives and
google for any of your new prospective employers.




signature.asc
Description: This is a digitally signed message part

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-16 Thread Timo Sirainen

On 17.1.2011, at 5.16, Noel Butler wrote:

 keep peddling, its funny to watch, thankfully its in the archives and
 google for any of your new prospective employers.

You should be more worried about prospective employers finding your own mails. 
I'll help by moderating your mails before they reach the list.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Andrzej Adam Filip

Stan Hoeppner s...@hardwarefreak.com wrote:
 [...] The average size of an email worldwide today is less than 4KB,
 less than one typical filesystem block. [...]

Do not confuse unix culture of mostly plain text only email messages
with MS Junk culture of overblown formatting with background images
company logos as a few image files in every (internal) email.

-- 
[plen: Andrew] Andrzej Adam Filip : a...@onet.eu
Let thy maid servant be faithful, strong, and homely.
  -- Benjamin Franklin

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Ed W




One of the systems to fail was a firewall running off SSD.


SSD or CF?

It would appear it's also possible to damage some flash memory by 
powering off at the wrong moment?  I had a router running on a nearly 
new SLC flash card and it kept suffering errors every 24 hours and 
perhaps it was filesystem corruption since it was kind of fixed when I 
rebooted.  Then after a few more days it died completely, briefly I 
could repartition it and then an hour later I could no longer even get 
it detected to the OS and hence it appeared absolutely completely dead.


So that's a new 4GB SLC card, using around 500MB of it and a light 
writeable filesystem running pfsense (perhaps a few writes per minute) 
and it died inside a month...  I don't have enough data to see if it 
died from wear or if I was just unlucky...


This is a cheap (ish) CF card though, not an SSD drive

Ed W

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Charles Marcus

It would be nice if some of you could stop with the personal attacks.

While I agree that assuming that all users only receive 4K emails is not
realistic in most environments, neither is assuming a requirement of all
of the super-duper triple redundant hot fail-over for a mailstore with
no quota enforcing.

On 1/14/2011 11:16 PM, Stan Hoeppner wrote:
 But Joe User _will_ notice a difference if this server with the RAID 
 10 mentioned above is supporting 5000 concurrent users, not just Joe.
 Responses will lag. With the SSD you can support 1 concurrent
 users (assuming the rest of the hardware is up to that task and you
 have enough RAM) and responses for all of them will be nearly
 instantaneous. This is the difference SSD makes, and why it's worth
 the cost in many situations. However, doing so will require an email
 retention policy that doesn't allow unlimited storage--unless you
 can afford than much SSD capacity.

One thing we are looking at here (small 50+ userbase) is kind of a 'best
of both worlds' setup - using SSD's (haven't decided yet to trust a bare
striped set or go with a 4 drive RAID10 - probably the latter so I can
sleep at night) for the main OS and a limited amount of storage space
per user (maildir) for active/recent email, then use another namespace
with a much higher quota - I'm thinking about 10GB per user should do in
our environment - for 'slow' storage (cheap mechanical RAID10 setup) -
ie, emails that are only accessed on occasion (mdbox).

Then, enforce a smallish per user quota (how much would depend on your
particular environment, but I'm thinking something like 250 or maybe
500MB, since our users do get a lot of large attachments in the course
of doing business) on their INBOX -  Sent, Drafts and Templates folders
too, but that's a question on my list of 'how to do' - how to easily
place these 'special' folders on the 'fast' namespace, and all user
created folders in the 'slow' namespace. It would be really nice if
there were some kind of native way that dovecot could 'assign' the
'special' folders to the same namespace as the INBOX, and all other user
created folders to another...

Doing this will also help train users in proper email management -
treating their INBOX just like they would a physical INBOX tray on their
desk. They wouldn't just let paper pile up there, why do so in their
INBOX (because they 'can')? Ie, it should be something they should
always strive to keep totally EMPTY. Of course this practically never
happens, but the point is, they need to learn to make a decision once
they are finished with it, and most importantly, take said action -
either delete it, or file it.

-- 

Best regards,

Charles

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Rick Romero


Quoting Stan Hoeppner s...@hardwarefreak.com:

Rick Romero put forth on 1/14/2011 8:29 PM:

  
   And that's assuming a platter squeezing in 1TB of data at
7200RPMs doesn't
   get a comparable performance improvement to a higher rotational
speed on a
   lower volume platter...
  
   Size and density are irrelevant.  Higher density will allow
greater streaming
   throughput at the same spindle speed, _however_ this does
nothing for seek
   performance.  Streaming performance is meaningless for
transaction servers.
   IOPS performance is critical for transaction servers.  Seek
   performance equals
   IOPS performance.  The _only_ way to increase mechanical disk IOPS is to
   increase the spindle speed the or the speed of the head
actuator.  If you've
   watched mechanical drive evolution for the past 20 years you've seen that
   actuator speed hasn't increased due to the physical properties
of voice coil
   drive actuators.
  
   Hell for the price of a single 250gb SSD drive,
   you can RAID 10 TEN 7200 RPM 500GB SATAs.
  
   I think your pricing ratio is a bit off but we'll go with it.  You'd
   get 50,000
   4KB random IOPS from the SSD and only 750 IOPS from the RAID 10.  The
   SSD could
   handle 67 times as many emails per second for 10 times the cost.  Not
   a bad trade.
  
   So while, yes, my 10 drive SATA RAID 10 ONLY performs 166MB/sec with a
   'simplistic' dd test, In reality I just don't think Joe User is going to
   notice the difference between that and the superior performance of a
   single SSD drive when he POPs his 10 3k emails.
  
   But Joe User _will_ notice a difference if this server with the RAID 10
   mentioned above is supporting 5000 concurrent users, not just
Joe.  Responses
   will lag.  With the SSD you can support 1 concurrent users
(assuming the
   rest of the hardware is up to that task and you have enough RAM) and
   responses
   for all of them will be nearly instantaneous.  This is the difference
   SSD makes,
   and why it's worth the cost in many situations.  However, doing so
   will require
   an email retention policy that doesn't allow unlimited
   storage--unless you can
   afford than much SSD capacity.
  
   You can get 240,000 4k random IOPS and 1.9TB of capacity from two of
   these in a
   software RAID0 for $6,400 USD:
   http://www.newegg.com/Product/Product.aspx?Item=N82E16820227665
  
   That's enough transactional IOPS throughput to support well over 50,000
   concurrent IMAP users, probably far more.  Of course this would
   require a server
   likely on the order of at least a single socket G34 AMD 12 core
Magny Cours
   system w/2GHz cores, 128GB of RAM, and two free PCIe X4/X8 slots
for the SSD
   cards, based on a board such as this SuperMicro:
   http://www.newegg.com/Product/Product.aspx?Item=N82E16813182240
   (Actually this is the perfect board for running two of these
   RevoDrive X2 cards)


I use pricewatch - so, yes, we may be talking refurb drives, but this is
not an issue when you're saving enough money to just buy a few more of
items you're already buying.

Also, if your filesystem is using 4k clusters, aren't you only using 1
random IOPS for a 4k email?   It just sounds to me like if you plan
'smarter', anyone can avoid the excessive costs of SSD and get 'end user
similar' performance with commodity hardware.

Rick

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Sven Hartge

Andrzej Adam Filip a...@onet.eu wrote:
 Stan Hoeppner s...@hardwarefreak.com wrote:

 [...] The average size of an email worldwide today is less than 4KB,
 less than one typical filesystem block. [...]

 Do not confuse unix culture of mostly plain text only email messages
 with MS Junk culture of overblown formatting with background images
 company logos as a few image files in every (internal) email.

I just did a rough analysis of the mail spool of my university (6.000
users, students and faculty staff, about 10 million mails) and the
average mail size was at about 96KiB. Last year, this average was at
77KiB and in 2009 we were at 62KiB.

Mails the average size of 4KiB would then have been at a time when
MIME was not yet invented, I believe. Somewhere in 1994.

Grüße,
Sven.

-- 
Sig lost. Core dumped.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Bradley Giesbrecht



On Jan 15, 2011, at 6:30 AM, Charles Marcus wrote:

One thing we are looking at here (small 50+ userbase) is kind of a  
'best
of both worlds' setup - using SSD's (haven't decided yet to trust a  
bare

striped set or go with a 4 drive RAID10 - probably the latter so I can
sleep at night) for the main OS and a limited amount of storage space
per user (maildir) for active/recent email, then use another namespace
with a much higher quota - I'm thinking about 10GB per user should  
do in

our environment - for 'slow' storage (cheap mechanical RAID10 setup) -
ie, emails that are only accessed on occasion (mdbox).

Then, enforce a smallish per user quota (how much would depend on your
particular environment, but I'm thinking something like 250 or maybe
500MB, since our users do get a lot of large attachments in the course
of doing business) on their INBOX -  Sent, Drafts and Templates  
folders

too, but that's a question on my list of 'how to do' - how to easily
place these 'special' folders on the 'fast' namespace, and all user
created folders in the 'slow' namespace. It would be really nice if
there were some kind of native way that dovecot could 'assign' the
'special' folders to the same namespace as the INBOX, and all other  
user

created folders to another...

Doing this will also help train users in proper email management -
treating their INBOX just like they would a physical INBOX tray on  
their

desk. They wouldn't just let paper pile up there, why do so in their
INBOX (because they 'can')? Ie, it should be something they should
always strive to keep totally EMPTY. Of course this practically never
happens, but the point is, they need to learn to make a decision once
they are finished with it, and most importantly, take said action -
either delete it, or file it.


Sounds like a great idea. I work with media companies where quotas can  
be challenging.


--
Brad

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Andrzej Adam Filip

Sven Hartge s...@svenhartge.de wrote:
 Andrzej Adam Filip a...@onet.eu wrote:
 Stan Hoeppner s...@hardwarefreak.com wrote:

 [...] The average size of an email worldwide today is less than 4KB,
 less than one typical filesystem block. [...]

 Do not confuse unix culture of mostly plain text only email messages
 with MS Junk culture of overblown formatting with background images
 company logos as a few image files in every (internal) email.

 I just did a rough analysis of the mail spool of my university (6.000
 users, students and faculty staff, about 10 million mails) and the
 average mail size was at about 96KiB. Last year, this average was at
 77KiB and in 2009 we were at 62KiB.

 Mails the average size of 4KiB would then have been at a time when
 MIME was not yet invented, I believe. Somewhere in 1994.

 Grüße,
 Sven.

I assume that in bigger organizations most mail stored in IMAP storage
is internal. I also assume that size of typical mail in unix/linux
culture and MS culture do differ. It may explain quite different 
experiences.

Could you elaborate about penetration by MS software/culture (especially
about MS Exchange) in your university?

BTW I have seen a few (smaller) organizations with most (internal) mails
below 4KB but remaining *huge* mails capable to very significantly
influence average size. It makes me doubt about value of 
*bare* average email size.

P.S. Anyway many organization are legally obliged to archive all emails.

-- 
[plen: Andrew] Andrzej Adam Filip : a...@onet.eu
Oh wearisome condition of humanity!
Born under one law, to another bound.
  -- Fulke Greville, Lord Brooke

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Stan Hoeppner

Andrzej Adam Filip put forth on 1/15/2011 4:02 AM:
 Stan Hoeppner s...@hardwarefreak.com wrote:
 [...] The average size of an email worldwide today is less than 4KB,
 less than one typical filesystem block. [...]
 
 Do not confuse unix culture of mostly plain text only email messages
 with MS Junk culture of overblown formatting with background images
 company logos as a few image files in every (internal) email.

average size of an email worldwide

The bulk of all email is personal, not corporate:  think Gmail, Hotmail, Yahoo,
the 50k ISPs worldwide, etc.  Average all of that together with the corporate
mail (small percentage), and you're well under 4KB per message, especially
considering the amount of SMS gatewaying going on with smart phones today.  Most
of those are one liners with the smtp header being 4 times the size of the body,
with total message size being under 1KB.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Philipp Haselwarter

SH == Stan Hoeppner s...@hardwarefreak.com writes:

SH Andrzej Adam Filip put forth on 1/15/2011 4:02 AM:
 Stan Hoeppner s...@hardwarefreak.com wrote:
 [...] The average size of an email worldwide today is less than 4KB,
 less than one typical filesystem block. [...]
 
 Do not confuse unix culture of mostly plain text only email
 messages with MS Junk culture of overblown formatting with
 background images company logos as a few image files in every
 (internal) email.

SH average size of an email worldwide

SH The bulk of all email is personal, not corporate: [...]

,
| More than 97% of all e-mails sent over the net are unwanted, according
| to a Microsoft security report.[39]
| 
| MAAWG estimates that 85% of incoming mail is abusive email, as of the
| second half of 2007. The sample size for the MAAWG's study was over 100
| million mailboxes.[40][41][42]
| 
| Spamhaus estimates that 90% of incoming e-mail traffic is spam in North
| America, Europe or Australasia.[43] By June 2008 96.5% of e-mail
| received by businesses was spam.[18][unreliable source?]
`
http://en.wikipedia.org/wiki/E-mail_spam#As_a_percentage_of_the_total_volume_of_e-mail

---8---[snipped 3 lines]---8---
SH under 4KB per message, especially considering the amount of SMS
SH gatewaying going on with smart phones today.  Most of those are one
SH liners with the smtp header being 4 times the size of the body, with
SH total message size being under 1KB.

SH -- Stan

I just have a tiny set of 4k spam mails, but they have an avg size of
39KB, ie well above 4KB.


-- 
Philipp Haselwarter

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Stan Hoeppner

Brandon Davidson put forth on 1/14/2011 10:59 PM:

 You obviously don't live in the same world I do. Have you ever been part of

Not currently, no, thankfully.

 a grant approval process and seen what kinds of files are exchanged, and

I've never worked in the public sector, only private, so I've not dealt with the
grant process, but I'm not totally ignorant of them either.  I've assisted a
couple of colleagues in the past with grant proposals.  And yes, they can, and
often do, suck.

 with what frequency? Complied with retention and archival policies? Dealt
 with folks who won't (or can't) delete an message once they've received it?

I have, unfortunately, had to deal with regulatory compliance and some of the
less than sane communications retention policies.

 Blithely applying some inexplicable figure you've pulled out of
 who-knows-where and extrapolating from that hardly constitutes prudent
 planning. 

Statistics are guidelines.  As I'm not planning anything in this thread, I don't
see how such non existent planning could be categorized as prudent or not.  What
I did do is simply make the case that 252TB seems bleeping outrageously high for
5k users, whether that entails email alone or every other kind of storage those
5k users need.  If my math is correct, that's about 50GB/user, including your
snapshot LUNs, etc.

 We based our requirement on real numbers observed in our
 environment, expected growth, and our budget cycle. 

You forgot to mention the 35-45% (mentioned below) gross storage loss due to
inefficiencies in your chosen hardware vendor's platform/architecture.  Over a
third and almost half of the drive cost is entangled here is it not?

 How do you plan? More
 blind averaging?

Ouija board.

 You're close, if a bit high with one of your guesses. Netapp is good to
 Education. 

Vendors with the largest profit margins (read: over priced) built into their
products are those most willing and able to give big discounts to select 
customers.

 Not that it matters - you know very little about the financial
 state of my institution or how capital expenditures work within my
 department's funding model.

That's true.  I know nothing about the financial state of your institution.  I
didn't claim to.  I simply know Oregon was/is facing a $3.8B deficit.  Your
institution is part of the state government budget.  Thus, your institution's
spending is part of that budget/deficit.  That's simply fact.  No?

 I suppose I shouldn't be surprised though, you seem to be very skilled at
 taking a little bit of information and making a convincing-sounding argument
 about it... regardless of how much you actually know.

I know this:  252TB is bleeping ridiculously large for 5K seats at _any_
university, public or private, regardless of how much is wasted for data
management.  Also, 34%-45% consumption of raw for any internal array
functions/management is bleeping ridiculous.  Is that your definition of
enterprise?  Massive required waste of raw capacity?

 I work for central IS, so this is the first stage of a consolidated service
 offering that we anticipate may encompass all of our staff and faculty. We
 bought what we could with what we had, anticipating that usage will grow
 over time as individual units migrate off their existing infrastructure.
 Again, you're guessing and casting aspersions.

Guessing?  Originally you stated that 252TB for email only, or specifically
Exchange.  You said nothing of a mass storage consolidation project:

Brad Davidson put forth on 1/14/2011 6:25 PM:

 We just bought 252TB of raw disk for about 5k users. Given, this is
 going in to Exchange on Netapp


Casting aspersions?  aspersion:

a : a false or misleading charge meant to harm someone's reputation cast
aspersions on her integrity

What false or misleading charge did I make with the intention to harm your
reputation Brad?  I've merely made a technical argument for sane email retention
policies, and against the need for 252TB for 5K users' email.  I don't recall
casting any aspersions.

 This is enterprise storage; I'm not sure that you know what this actually
 means either.  With Netapp you generally lose on the order of 35-45% due to
 right-sizing, RAID, spares, and aggregate/volume/snapshot reserves. What's
 left will be carved up into LUNs and presented to the hosts.

If your definition of enterprise storage is losing 34-45% of raw capacity for
housekeeping chores, then I'll stick with my definition, and with Nexsan for my
enterprise storage needs.

You didn't mention deduplication once yet.  With Nexsan's DeDupe SG I cut my
regulatory dictated on disk storage requirements in half, and Nexsan disk costs
half as much as NetApp, for the same SATA disks.  With Nexsan, my overall
storage costs are less than half of a NetApp, for basically the same capability.
 The only downside is that I can't get all of the functionality in a single
head controller--however in many ways this is actually an advantage.  My total
costs are

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Stan Hoeppner

Sven Hartge put forth on 1/15/2011 9:29 AM:
 Andrzej Adam Filip a...@onet.eu wrote:
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 [...] The average size of an email worldwide today is less than 4KB,
 less than one typical filesystem block. [...]
 
 Do not confuse unix culture of mostly plain text only email messages
 with MS Junk culture of overblown formatting with background images
 company logos as a few image files in every (internal) email.
 
 I just did a rough analysis of the mail spool of my university (6.000
 users, students and faculty staff, about 10 million mails) and the
 average mail size was at about 96KiB. Last year, this average was at
 77KiB and in 2009 we were at 62KiB.
 
 Mails the average size of 4KiB would then have been at a time when
 MIME was not yet invented, I believe. Somewhere in 1994.

No.  You're doing a statistical mean.  You need to be doing median.  The reason
should be obvious.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Stan Hoeppner

Stan Hoeppner put forth on 1/15/2011 11:03 PM:
 Sven Hartge put forth on 1/15/2011 9:29 AM:

 Mails the average size of 4KiB would then have been at a time when
 MIME was not yet invented, I believe. Somewhere in 1994.
 
 No.  You're doing a statistical mean.  You need to be doing median.  The 
 reason
 should be obvious.

Correcting myself here.  You are right, this should be a mean calculation.  And
the reason is obvious. ;)

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Stan Hoeppner

Philipp Haselwarter put forth on 1/15/2011 8:32 PM:

 ,
 | More than 97% of all e-mails sent over the net are unwanted, according
 | to a Microsoft security report.[39]
 | 
 | MAAWG estimates that 85% of incoming mail is abusive email, as of the
 | second half of 2007. The sample size for the MAAWG's study was over 100
 | million mailboxes.[40][41][42]
 | 
 | Spamhaus estimates that 90% of incoming e-mail traffic is spam in North
 | America, Europe or Australasia.[43] By June 2008 96.5% of e-mail
 | received by businesses was spam.[18][unreliable source?]
 `

 I just have a tiny set of 4k spam mails, but they have an avg size of
 39KB, ie well above 4KB.

This discussion has been in the context of _storing_ user email.  The assumption
is that an OP is smart/talented enough to get his spam filters/appliances
killing 99% before it reaches intermediate storage or mailboxes.  Thus, in the
context of this discussion, the average size of a spam message is irrelevant,
because we're talking about what goes into the mail store.

If you're storing significantly more than 1% of spam you need to get that under
control before doing any kind of meaningful analysis of mail storage needs.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-15 Thread Noel Butler

LOL this is just s funny., watching the no no no im right you're
wrong, give up stanley, those on many lists are aware of your trolling,
nobody cares about your lil SOHO world, this list contains many
different sized orgs, and like someone else mentione,d the 4K email size
is SO 1994, but, that about sums you up anyway.



On Sat, 2011-01-15 at 23:19 -0600, Stan Hoeppner wrote:

 Philipp Haselwarter put forth on 1/15/2011 8:32 PM:
 
  ,
  | More than 97% of all e-mails sent over the net are unwanted, according
  | to a Microsoft security report.[39]
  | 
  | MAAWG estimates that 85% of incoming mail is abusive email, as of the
  | second half of 2007. The sample size for the MAAWG's study was over 100
  | million mailboxes.[40][41][42]
  | 
  | Spamhaus estimates that 90% of incoming e-mail traffic is spam in North
  | America, Europe or Australasia.[43] By June 2008 96.5% of e-mail
  | received by businesses was spam.[18][unreliable source?]
  `
 
  I just have a tiny set of 4k spam mails, but they have an avg size of
  39KB, ie well above 4KB.
 
 This discussion has been in the context of _storing_ user email.  The 
 assumption
 is that an OP is smart/talented enough to get his spam filters/appliances
 killing 99% before it reaches intermediate storage or mailboxes.  Thus, in the
 context of this discussion, the average size of a spam message is irrelevant,
 because we're talking about what goes into the mail store.
 
 If you're storing significantly more than 1% of spam you need to get that 
 under
 control before doing any kind of meaningful analysis of mail storage needs.
 




signature.asc
Description: This is a digitally signed message part

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Robert Brockway


On Thu, 13 Jan 2011, Timo Sirainen wrote:

How do they fail? Supposedly once a cell has reached its erase-limit it 
should become read-only. Maybe the failures had nothing to do with 
wearing?


Hi Timo.  I start seeing I/O errors on read or write.  I must admit I 
don't have definitive proof that it is a wear level problem but there 
isn't any other obvious cause either.  ie, no known heat or other problems 
have occured on the systems.


It's left me a bit gun-shy of deploying SSD more widely.  I would 
certainly love to be able to trust it as an alternative to disk.


Yes cells certainly should go read-only when they run out of write-cycles, 
which would be a far less painful way for them to fail :)


One of the systems to fail was a firewall running off SSD.  A Linux based 
firewall can lose entire filesystems and keep running[1] so I first 
noticed the problem when the backups started to fail.


[1] Although you can't change the firewall ruleset without userspace 
tools.


Cheers,

Rpb

--
Email: rob...@timetraveller.org Linux counter ID #16440
IRC: Solver (OFTC  Freenode)
Web: http://www.practicalsysadmin.com
Contributing member of Software in the Public Interest (http://spi-inc.org/)
Open Source: The revolution that silently changed the world

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread David Jonas

On 1/12/11 , Jan 12, 11:46 PM, Stan Hoeppner wrote:
 David Jonas put forth on 1/12/2011 6:37 PM:
 
 I've been considering getting a pair of SSDs in raid1 for just the
 dovecot indexes. The hope would be to minimize the impact of pop3 users
 hammering the server. Proposed design is something like 2 drives (ssd or
 platter) for OS and logs, 2 ssds for indexes (soft raid1), 12 sata or
 sas drives in RAID5 or 6 (hw raid, probably 3ware) for maildirs. The
 indexes and mailboxes would be mirrored with drbd. Seems like the best
 of both worlds -- fast and lots of storage.
 
 Let me get this straight.  You're moving indexes to locally attached SSD for
 greater performance, and yet, you're going to mirror the indexes and store 
 data
 between two such cluster hosts over a low bandwidth, high latency GigE network
 connection?  If this is a relatively low volume environment this might work.
 But, if the volume is high enough that you're considering SSD for performance,
 I'd say using DRBD here might not be a great idea.

First, thanks for taking the time to respond! I appreciate the good
information.

Currently running DRBD for high availability over directly attached
bonded GigE with jumbo frames. Works quite well. Though indexes and
maildirs are on the same partition.

The reason for mirroring the indexes is just for HA failover. I can only
imagine the hit of rebuilding indexes for every connection after failover.

 Anyone have any improvements on the design? Suggestions?
 
 Yes.  Go with a cluster filesystem such as OCFS or GFS2 and an inexpensive SAN
 storage unit that supports mixed SSD and spinning storage such as the Nexsan
 SATABoy with 2GB cache:  http://www.nexsan.com/sataboy.php
 
 Get the single FC controller model, two Qlogic 4Gbit FC PCIe HBAs, one for 
 each
 cluster server.  Attach the two servers to the two FC ports on the SATABoy
 controller.  Unmask each LUN to both servers.  This enabling the cluster 
 filesystem.
 
 Depending on the space requirements of your indexes, put 2 or 4 SSDs in a 
 RAID0
 stripe.  RAID1 simply DECREASES the overall life of SSDs.  SSDs don't have the
 failure modes of mechanical drives thus RAID'ing them is not necessary.  You
 don't duplex your internal PCIe RAID cards do you?  Same failure modes as 
 SSDs.

Interesting. I hadn't thought about it that way. We haven't had an SSD
fail yet so I have no experience there yet. And I've been curious to try
GFS2.

 Occupy the remaining 10 or 12 disk bays with 500GB SATA drives.  Configure 
 them
 as RAID10.  RAID5/6 aren't suitable to substantial random write workloads such
 as mail and database.  Additionally, rebuild times for parity RAID schemes 
 (5/6)
 are up in the many hours, or even days category, and degraded performance of 
 5/6
 is horrible.  RAID10 rebuild times are a couple of hours and RAID10 suffers 
 zero
 performance loss when a drive is down.  Additionally, RAID10 can lose HALF the
 drives in the array as long as no two are both drives in a mirror pair.  Thus,
 with a RAID10 of 10 disks, you could potentially lose 5 drives with no loss in
 performance.  The probability of this is rare, but it demonstrates the point.
 With a 10 disk RAID 10 of 7.2k SATA drives, you'll have ~800 random read/write
 IOPS performance.  That' may seem low, but that's an actual filesystem figure.
 The physical IOPS figure is double that, 1600.  Since you'll have your indexes
 on 4 SSDs, and the indexes are where the bulk of IMAP IOPS take place (flags),
 you'll have over 50,000 random read/write IOPS.

Raid10 is our normal go to, but giving up half the storage in this case
seemed unnecessary. I was looking at SAS drives and it was getting
pricy. I'll work SATA into my considerations.

 Having both SSD and spinning drives in the same SAN controller eliminates the
 high latency low bandwidth link you were going to use with drbd.  It also
 eliminates buying twice as many SSDs, PCIe RAID cards, and disks, one set for
 each cluster server.  Total cost may end up being similar between the drbd and
 SAN based solutions, but you have significant advantages with the SAN solution
 beyond those already mentioned, such as using an inexpensive FC switch and
 attaching a D2D or tape backup host, installing the cluster filesystem 
 software
 on it, and directly backing up the IMAP store while the cluster is online and
 running, or snapshooting it after doing a freeze at the VFS layer.

As long as the SATAboy is reliable I can see it. Probably would be
easier to sell to the higher ups too. They won't feel like they're
buying everything twice.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Stan Hoeppner

David Jonas put forth on 1/14/2011 2:08 PM:

 Raid10 is our normal go to, but giving up half the storage in this case
 seemed unnecessary. I was looking at SAS drives and it was getting
 pricy. I'll work SATA into my considerations.

That's because you're using the wrong equation for determining your disk storage
needs.  I posted a new equation on one of the lists a week or two ago.
Performance and reliability are far more important now than total space.  And
today performance means transactional write IOPS not streaming reads.  In
today's world, specifically for transaction oriented applications (db and mail)
smaller faster more expensive disks are less expensive in total ROI that big fat
slow drives.  The reason is that few if any organizations actually need 28TB (14
2TB Cavier Green drives--popular with idiots today) of mail storage in a single
mail store.  That's 50 years worth of mail storage for a 50,000 employee
company, assuming your employees aren't allowed porn/video attachments, which
which most aren't.

 Having both SSD and spinning drives in the same SAN controller eliminates the
 high latency low bandwidth link you were going to use with drbd.  It also
 eliminates buying twice as many SSDs, PCIe RAID cards, and disks, one set for
 each cluster server.  Total cost may end up being similar between the drbd 
 and
 SAN based solutions, but you have significant advantages with the SAN 
 solution
 beyond those already mentioned, such as using an inexpensive FC switch and
 attaching a D2D or tape backup host, installing the cluster filesystem 
 software
 on it, and directly backing up the IMAP store while the cluster is online and
 running, or snapshooting it after doing a freeze at the VFS layer.
 
 As long as the SATAboy is reliable I can see it. Probably would be
 easier to sell to the higher ups too. They won't feel like they're
 buying everything twice.

Hit their website and look at their customer list and industry awards.  They've
won them all pretty much.  Simple, reliable, inexpensive SAN storage arrays.  No
advanced features such as inbuilt snapshots and the like.  Performance isn't the
fastest on the market but it's far more than adequate.  The performance per
dollar ratio is very high.  I've installed and used a SATABlade and SATABoy
myself and they're extremely reliable, and plenty fast.  Those were spinning
models.  I've not used SSDs in their chassis yet.

http://www.nexsan.com

You configure the controller and drives via a web interface over an ethernet
port.  There's a lot to love in the way Nexsan builds these things.  At least,
if you're a HardwareFreak like me.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Noel Butler

On Fri, 2011-01-14 at 17:29 -0600, Stan Hoeppner wrote:

 slow drives.  The reason is that few if any organizations actually need 28TB 
 (14
 2TB Cavier Green drives--popular with idiots today) of mail storage in a 
 single
 mail store.  That's 50 years worth of mail storage for a 50,000 employee
 company, assuming your employees aren't allowed porn/video attachments, which
 which most aren't.
 

WTF?  28TB of mail storage for some is rather small. Good to see your
still posting without a clue Stanley.
Remember there is a bigger world out there from your tiny SOHO




signature.asc
Description: This is a digitally signed message part

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Brad Davidson

 The reason is that few if any organizations actually need
 28TB (14
 2TB Cavier Green drives--popular with idiots today) of mail storage
in a
 single
 mail store.  That's 50 years worth of mail storage for a 50,000
employee
 company, assuming your employees aren't allowed porn/video
attachments,
 which
 which most aren't.

 
 WTF?  28TB of mail storage for some is rather small. Good to see your
 still posting without a clue Stanley.
 Remember there is a bigger world out there from your tiny SOHO

I'm with you Noel.

We just bought 252TB of raw disk for about 5k users. Given, this is
going in to Exchange on Netapp with multi-site database replication, so
this cooks down to about 53TB of usable space with room for recovery
databases, defragmentation, archives, etc, but still... 28TB is not much
anymore.

Of course, Exchange has also gone in a different direction than folks
have been indicating. 2010 has some pretty high memory requirements, but
the actual IOPS demands are quite low compared to earlier versions.
We're using 1TB 7200RPM SATA drives, and at the number of spindles we've
got, combined with the cache in the controllers, expect to have quite a
good bit of excess IOPS.

Even on the Dovecot side though - if you use the Director to group your
users properly, and equip the systems with enough memory, disk should
not be a bottleneck if you do anything reasonably intelligent. We
support 12k concurrent IMAP users at ~.75 IOPS/user/sec. POP3, SMTP, and
shell access on top of that is negligible.

I'm also surprised by the number of people trying to use DRBD to make
local disk look like a SAN so they can turn around and put a cluster
filesystem on it - with all those complex moving parts, how do you
diagnose poor performance? Who is going to be able to support it if you
get hit by a bus? Seems like folks would be better off building or
buying a sturdy NFS server. Heck, even at larger budgets, if you're
probably just going to end up with something that's essentially a
clustered NFS server with a SAN behind it.

-Brad

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Rick Romero


Quoting Stan Hoeppner s...@hardwarefreak.com:

David Jonas put forth on 1/14/2011 2:08 PM:

  
   Raid10 is our normal go to, but giving up half the storage in this case
   seemed unnecessary. I was looking at SAS drives and it was getting
   pricy. I'll work SATA into my considerations.
  
   That's because you're using the wrong equation for determining your
   disk storage
   needs.  I posted a new equation on one of the lists a week or two ago.
   Performance and reliability are far more important now than
total space.  And
   today performance means transactional write IOPS not streaming reads.  In
   today's world, specifically for transaction oriented applications (db
   and mail)
   smaller faster more expensive disks are less expensive in total ROI
   that big fat
   slow drives.  The reason is that few if any organizations actually
   need 28TB (14
   2TB Cavier Green drives--popular with idiots today) of mail storage
   in a single
   mail store.  That's 50 years worth of mail storage for a 50,000 employee
   company, assuming your employees aren't allowed porn/video
attachments, which
   which most aren't.

And that's assuming a platter squeezing in 1TB of data at 7200RPMs doesn't
get a comparable performance improvement to a higher rotational speed on a
lower volume platter...  Hell for the price of a single 250gb SSD drive,
you can RAID 10 TEN 7200 RPM 500GB SATAs.  

So while, yes, my 10 drive SATA RAID 10 ONLY performs 166MB/sec with a
'simplistic' dd test, In reality I just don't think Joe User is going to
notice the difference between that and the superior performance of a
single SSD drive when he POPs his 10 3k emails.

Rick

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Stan Hoeppner

Brad Davidson put forth on 1/14/2011 6:25 PM:

 We just bought 252TB of raw disk for about 5k users. Given, this is
 going in to Exchange on Netapp with multi-site database replication, so
 this cooks down to about 53TB of usable space with room for recovery
 databases, defragmentation, archives, etc, but still... 28TB is not much
 anymore.

The average size of an email worldwide today is less than 4KB, less than one
typical filesystem block.

28TB / 4KB = 28,000,000,000,000 bytes / 4096 bytes = 6,835,937,500 =
6.8 billion emails / 5,000 users =
1,367,188 emails per user

6.8 billion emails is not much anymore for a 5,000 seat org?

You work for the University of Oregon, correct?

From:  http://sunshinereview.org/index.php/Oregon_state_budget

Oregon's budget for FY2009-11 totals $61 billion.[1] The state faced a $3.8
billion biennium FY 2010-11 budget deficit, relying heavily on new taxes and
federal stimulus money to close the gap in the final budget signed by Gov. Ted
Kulongoski and passed by the Oregon Legislature.[2][3] In Aug. 2010, however,
the state budget deficit increased and could top $1 billion.[4] As a result, the
governor ordered 9% budget cuts.[5]

How much did that 252TB NetApp cost the university?  $300k?  $700k?  Just a drop
in the bucket right?  Do you think that was a smart purchasing decision, given
your state's $3.8 Billion deficit?  Ergo, do you think having an unlimited
email storage policy is a smart decision, based on your $3.8 Billion deficit?
Your fellow tax payers would probably suggest you need to reign in your email
storage policy.  Wouldn't you agree?

This is why people don't listen to Noel (I've had him kill filed for a year--but
not to the extreme of body filtering him).  They probably won't put much stock
in what you say either Brad.  Why?

You two don't live in reality.  Either that, or the reality you live in is
_VERY_ different from the rest of the sane world.  Policies like U of O's
unlimited email drive multi hundred thousand to million dollar systems and
storage purchases, driving the state budget further into the red, and demanding
more income tax from citizens to pay for it, since Oregon has no sales tax.

28TB is not much anymore.  Tell your fellow taxpayers what that 252TB cost
them and I guarantee they'll think 28TB is overkill, especially after you tell
them 28TB would store more than 1.3 million emails per each of those 5k
students, faculty, staff, etc.

For comparison, as of Feb 2009, the entire digital online content of the Library
of Congress was only 74TB.  And you just purchased 252TB just for email for a
5,000 head count subsection of a small state university's population?

http://blogs.loc.gov/loc/2009/02/how-big-is-the-library-of-congress/

Sane email retention policies would allow any 5k seat organization to get 10+
years of life out of 28TB (assuming the hardware lived that long).  Most could
do it with 1TB, which would allow 244k emails per user mailbox.  Assuming 28TB
net, not raw.  Subtract 20% for net and you're at 195k emails per user mailbox.
 Still overkill...

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Stan Hoeppner

Rick Romero put forth on 1/14/2011 8:29 PM:

 And that's assuming a platter squeezing in 1TB of data at 7200RPMs doesn't
 get a comparable performance improvement to a higher rotational speed on a
 lower volume platter...  

Size and density are irrelevant.  Higher density will allow greater streaming
throughput at the same spindle speed, _however_ this does nothing for seek
performance.  Streaming performance is meaningless for transaction servers.
IOPS performance is critical for transaction servers.  Seek performance equals
IOPS performance.  The _only_ way to increase mechanical disk IOPS is to
increase the spindle speed the or the speed of the head actuator.  If you've
watched mechanical drive evolution for the past 20 years you've seen that
actuator speed hasn't increased due to the physical properties of voice coil
drive actuators.

 Hell for the price of a single 250gb SSD drive,
 you can RAID 10 TEN 7200 RPM 500GB SATAs.  

I think your pricing ratio is a bit off but we'll go with it.  You'd get 50,000
4KB random IOPS from the SSD and only 750 IOPS from the RAID 10.  The SSD could
handle 67 times as many emails per second for 10 times the cost.  Not a bad 
trade.

 So while, yes, my 10 drive SATA RAID 10 ONLY performs 166MB/sec with a
 'simplistic' dd test, In reality I just don't think Joe User is going to
 notice the difference between that and the superior performance of a
 single SSD drive when he POPs his 10 3k emails.

But Joe User _will_ notice a difference if this server with the RAID 10
mentioned above is supporting 5000 concurrent users, not just Joe.  Responses
will lag.  With the SSD you can support 1 concurrent users (assuming the
rest of the hardware is up to that task and you have enough RAM) and responses
for all of them will be nearly instantaneous.  This is the difference SSD makes,
and why it's worth the cost in many situations.  However, doing so will require
an email retention policy that doesn't allow unlimited storage--unless you can
afford than much SSD capacity.

You can get 240,000 4k random IOPS and 1.9TB of capacity from two of these in a
software RAID0 for $6,400 USD:
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227665

That's enough transactional IOPS throughput to support well over 50,000
concurrent IMAP users, probably far more.  Of course this would require a server
likely on the order of at least a single socket G34 AMD 12 core Magny Cours
system w/2GHz cores, 128GB of RAM, and two free PCIe X4/X8 slots for the SSD
cards, based on a board such as this SuperMicro:
http://www.newegg.com/Product/Product.aspx?Item=N82E16813182240
(Actually this is the perfect board for running two of these RevoDrive X2 cards)

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Brandon Davidson

Stan,

On 1/14/11 7:09 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
 
 The average size of an email worldwide today is less than 4KB, less than one
 typical filesystem block.
 
 28TB / 4KB = 28,000,000,000,000 bytes / 4096 bytes = 6,835,937,500 =
 6.8 billion emails / 5,000 users =
 1,367,188 emails per user
 
 6.8 billion emails is not much anymore for a 5,000 seat org?

You obviously don't live in the same world I do. Have you ever been part of
a grant approval process and seen what kinds of files are exchanged, and
with what frequency? Complied with retention and archival policies? Dealt
with folks who won't (or can't) delete an message once they've received it?

Blithely applying some inexplicable figure you've pulled out of
who-knows-where and extrapolating from that hardly constitutes prudent
planning. We based our requirement on real numbers observed in our
environment, expected growth, and our budget cycle. How do you plan? More
blind averaging?

 How much did that 252TB NetApp cost the university?  $300k?  $700k?  Just a
 drop
 in the bucket right?  Do you think that was a smart purchasing decision, given
 your state's $3.8 Billion deficit?

You're close, if a bit high with one of your guesses. Netapp is good to
Education. Not that it matters - you know very little about the financial
state of my institution or how capital expenditures work within my
department's funding model.

I suppose I shouldn't be surprised though, you seem to be very skilled at
taking a little bit of information and making a convincing-sounding argument
about it... regardless of how much you actually know.

 For comparison, as of Feb 2009, the entire digital online content of the
 Library
 of Congress was only 74TB.  And you just purchased 252TB just for email for a
 5,000 head count subsection of a small state university's population?

I work for central IS, so this is the first stage of a consolidated service
offering that we anticipate may encompass all of our staff and faculty. We
bought what we could with what we had, anticipating that usage will grow
over time as individual units migrate off their existing infrastructure.
Again, you're guessing and casting aspersions.

This is enterprise storage; I'm not sure that you know what this actually
means either. With Netapp you generally lose on the order of 35-45% due to
right-sizing, RAID, spares, and aggregate/volume/snapshot reserves. What's
left will be carved up into LUNs and presented to the hosts.

1/3 of the available capacity is passive 3rd-site disaster-recovery. The
remaining 2 sites each host both an active and a passive copy of each mail
store; we design to be able to sustain a site outage without loss of
service. Each site has extra space for several years of growth, database
restores, and archival / records retention reserves.

That's how 16TB of active mail can end up requiring 252TB of raw disk. Doing
things right can be expensive, but it's usually cheaper in the long run than
doing it wrong. It's like looking into a whole other world for you, isn't
it? No Newegg parts here...

-Brad

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Brandon Davidson

On 1/14/11 8:59 PM, Brandon Davidson brand...@uoregon.edu wrote:
 I work for central IS, so this is the first stage of a consolidated service
 offering that we anticipate may encompass all of our staff and faculty. We
 bought what we could with what we had, anticipating that usage will grow
 over time as individual units migrate off their existing infrastructure.
 
 1/3 of the available capacity is passive 3rd-site disaster-recovery. The
 remaining 2 sites each host both an active and a passive copy of each mail
 store; we design to be able to sustain a site outage without loss of
 service. Each site has extra space for several years of growth, database
 restores, and archival / records retention reserves.

Oh, and you probably don't even want to think about what we did for our
Dovecot infrastructure. Clustered NFS servers with seamless failover,
snapshotting, and real-time block-level replication aren't cheap. The
students and faculty/staff not supported by an existing Exchange environment
aren't getting any less support, I'll say that much.

Folks trust us with their education, their livelihoods, and their personal
lives. I'd like to think that 'my fellow taxpayers' understand the
importance of what we do and appreciate the measures we take to ensure the
integrity and availability of their data.

-Brad

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-14 Thread Noel Butler

On Fri, 2011-01-14 at 21:09 -0600, Stan Hoeppner wrote:

 Brad Davidson put forth on 1/14/2011 6:25 PM:
 
  We just bought 252TB of raw disk for about 5k users. Given, this is
  going in to Exchange on Netapp with multi-site database replication, so
  this cooks down to about 53TB of usable space with room for recovery
  databases, defragmentation, archives, etc, but still... 28TB is not much
  anymore.
 
 The average size of an email worldwide today is less than 4KB, less than one
 typical filesystem block.
 


Standard in your eyes maybe, hell, 4kb would barely cover the headers in
some messages, I guess you also haven't heard about these things called
email attachments, please google it, we are not here to be your
educators, though christ knows someone needs to be.
PS your small message alone was 6K.




 This is why people don't listen to Noel (I've had him kill filed for a 
 year--but
 not to the extreme of body filtering him).  They probably won't put much stock
 in what you say either Brad.  Why?
 


oh my, what will I do, loss of sleep coming up? I think not... perhaps
the fact truth hurts Stanley much more.


 You two don't live in reality.  Either that, or the reality you live in is


really? WOW cool, thanks, I guess I'll shut down all those smtp servers,
the databases  all of it, I mean, if its all a figment of our
imagination, then all the power costs and costs of BTU cooling
requirements, the data costs, they will vanish when we wake up and come
baqck to stans reality? AWESOME!

*chuckles*  this is better than any sitcom :P

attachment: face-raspberry.png

signature.asc
Description: This is a digitally signed message part

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-13 Thread Miha Vrhovnik

As a sanity check - I found some data from Mtron (one of the few SSD oems who
do quote endurance in a way that non specialists can understand). In the data
sheet for their 32G product - which incidentally has 5 million cycles write
endurance - they quote the write endurance for the disk as greater than 85
years assuming 100G / day erase/write cycles - which involves overwriting the
disk 3 times a day.

That was written in 2007.  SSD flash cell life has increased substantially in
the 3-4 year period since.
Stan you are wrong on that...
flash cell life decreases when they shrink process in which they are made.

With the new generation of 25nm flash it's down to 3000/programming cycles. So 
the drive manufactures have to battle that with increasing a ECC length and 
better wear leaving. Or if I quote anand
When I first started reviewing SSDs IMFT was shipping 50nm MLC NAND rated at 
10,000 program/erase cycles per cell. As I mentioned in a recent SSD article, 
the move to 3xnm cut that endurance rating in half. Current NAND shipping in 
SSDs can only last half as long, or approximately 5,000 program/erase cycles 
per cell. Things aren’t looking any better for 25nm. Although the first 25nm 
MLC test parts could only manage 1,000 P/E cycles, today 25nm MLC NAND is good 
for around 3,000 program/erase cycles per cell.

The reduction in P/E cycles is directly related to the physics of shrinking 
these NAND cells; the smaller they get, the faster they deteriorate with each 
write.

Please read the article on annandtech [1]

1 - http://www.anandtech.com/show/4043/micron-announces-clearnand-25nm-with-ecc

Regards,
Miha

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-13 Thread David Woodhouse

On Wed, 2011-01-12 at 09:53 -0800, Marc Perkel wrote:
 I just replaced my drives for Dovecot using Maildir format with a pair 
 of Solid State Drives (SSD) in a raid 0 configuration. It's really 
 really fast. Kind of expensive but it's like getting 20x the speed for 
 20x the price. I think the big gain is in the 0 seek time. 

You may find ramfs is even faster :)

I hope you have backups.

-- 
dwmw2

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-13 Thread Robert Brockway


On Wed, 12 Jan 2011, Timo Sirainen wrote:


On 12.1.2011, at 21.15, Matt wrote:


I thought about doing this on my email server since its troubles are
mostly disk I/O saturation but I was concerned about reliability.
Have heard that after so many read/writes SSD will go bad.


There's no need to worry about that in any modern drives.


Hi Timo.  Wear levelling often isn't as good as is claimed on the box. 
Often wear levelling is only across subsets of the SSD not across the 
entire device.


I've seen several SSD drives fail in production after about 12 months of 
use, and this in low-write environments (eg, I log syslog to a remote 
syslog server).   I'm refraining from deploying SSD for mail servers for 
now, much as I would love to.


Rob

--
Email: rob...@timetraveller.org Linux counter ID #16440
IRC: Solver (OFTC  Freenode)
Web: http://www.practicalsysadmin.com
Contributing member of Software in the Public Interest (http://spi-inc.org/)
Open Source: The revolution that silently changed the world

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-13 Thread Timo Sirainen

On 13.1.2011, at 19.37, Robert Brockway wrote:

 Hi Timo.  Wear levelling often isn't as good as is claimed on the box. Often 
 wear levelling is only across subsets of the SSD not across the entire device.
 
 I've seen several SSD drives fail in production after about 12 months of use, 
 and this in low-write environments (eg, I log syslog to a remote syslog 
 server).   I'm refraining from deploying SSD for mail servers for now, much 
 as I would love to.


How do they fail? Supposedly once a cell has reached its erase-limit it should 
become read-only. Maybe the failures had nothing to do with wearing?

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-13 Thread Steve


 Original-Nachricht 
 Datum: Thu, 13 Jan 2011 10:17:20 +0100
 Von: Miha Vrhovnik miha.vrhov...@cordia.si
 An: dovecot@dovecot.org
 Betreff: Re: [Dovecot]SSD drives are really fast running Dovecot

 As a sanity check - I found some data from Mtron (one of the few SSD
 oems who
 do quote endurance in a way that non specialists can understand). In the
 data
 sheet for their 32G product - which incidentally has 5 million cycles
 write
 endurance - they quote the write endurance for the disk as greater than
 85
 years assuming 100G / day erase/write cycles - which involves
 overwriting the
 disk 3 times a day.
 
 That was written in 2007.  SSD flash cell life has increased
 substantially in
 the 3-4 year period since.
 Stan you are wrong on that...
 flash cell life decreases when they shrink process in which they are made.
 
 With the new generation of 25nm flash it's down to 3000/programming
 cycles. So the drive manufactures have to battle that with increasing a ECC
 length and better wear leaving. Or if I quote anand
 When I first started reviewing SSDs IMFT was shipping 50nm MLC NAND rated
 at 10,000 program/erase cycles per cell. As I mentioned in a recent SSD
 article, the move to 3xnm cut that endurance rating in half. Current NAND
 shipping in SSDs can only last half as long, or approximately 5,000
 program/erase cycles per cell. Things aren’t looking any better for 25nm. 
 Although
 the first 25nm MLC test parts could only manage 1,000 P/E cycles, today
 25nm MLC NAND is good for around 3,000 program/erase cycles per cell.
 
 The reduction in P/E cycles is directly related to the physics of
 shrinking these NAND cells; the smaller they get, the faster they deteriorate 
 with
 each write.
 
I would not use MLC in a server environment. SLC has much better program/erase 
cycles per cell.


 Please read the article on annandtech [1]
 
 1 -
 http://www.anandtech.com/show/4043/micron-announces-clearnand-25nm-with-ecc
 
 Regards,
 Miha   

-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!   
Jetzt informieren: http://www.gmx.net/de/go/freephone

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-13 Thread Eric Rostetter


Quoting David Jonas djo...@vitalwerks.com:


I've been considering getting a pair of SSDs in raid1 for just the
dovecot indexes.


While raid-1 is better than the raid-0 of the previous poster, do you
really want to slow down your fast SDDs with software raid-1 on top of them?


The hope would be to minimize the impact of pop3 users
hammering the server.



Proposed design is something like 2 drives (ssd or
platter) for OS and logs


The OS is mostly cached, so any old drive should work.  Logs are lots
of writes, so a dedicated drive might be nice (no raid).  Be sure to
tune the OS for the logs...


2 ssds for indexes (soft raid1),


I'd probably just go with one drive, maybe have a spare as a cold- or  
hot-spare

in case of failure.


12 sata or
sas drives in RAID5 or 6 (hw raid, probably 3ware) for maildirs. The


I'd say either raid-6 or raid-10 for this, depending on budget and size
needs.


indexes and mailboxes would be mirrored with drbd. Seems like the best
of both worlds -- fast and lots of storage.


drbd to where at what level?  There was a some other discussion about this
which basically said Don't use drbd to mirror between VM guests
which I agree with.  If you want to do this, use DRBD between VM servers
(physical hosts) and not between VM guests (virtual hosts).

I do use a VM cluster with DRBD between the physical hosts, but not for
mail services, and it works fine.  Doing DRBD inside the virtual hosts
though would not be good...


Does anyone run a configuration like this? How does it work for you?


No.  I do 2 nodes, with DRBD between them, using GFS on them for both
the mbox files and indexes...  No virtualization at all...  No SSD drives
at all...


Anyone have any improvements on the design? Suggestions?


Only my advice about where to drbd if you are virtualizing, and what
raid levels to use... But these are just my opinions and your milage
may vary...

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!

[Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Marc Perkel

I just replaced my drives for Dovecot using Maildir format with a pair 
of Solid State Drives (SSD) in a raid 0 configuration. It's really 
really fast. Kind of expensive but it's like getting 20x the speed for 
20x the price. I think the big gain is in the 0 seek time.


Here's what I bought.

Crucial RealSSD C300 CTFDDAC256MAG-1G1 2.5 256GB SATA III MLC Internal 
Solid State Drive (SSD) 
http://www.newegg.com/Product/Product.aspx?Item=N82E16820148349


   *   2.5
   *   256GB
   *   SATA III

   * *Sequential Access - Read:* 355MB/sec (SATA 6Gb/s) 265MB/sec (SATA
 3Gb/s)
   * *Sequential Access - Write:* 215MB/sec (SATA 6Gb/s) 215MB/sec
 (SATA 3Gb/s)
   * *Power Consumption (Active):* 2.1W READ, 4.3W WRITE
   * *Power Consumption (Idle):* 0.094W

Running it on an Asus motherboard that supports SATA III - 6 core AMD 
CPU and 16 gigs of ram. Might be slightly off topic but this server screams!

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Rick Romero


Quoting Marc Perkel m...@perkel.com:

I just replaced my drives for Dovecot using Maildir format with a

   pair of Solid State Drives (SSD) in a raid 0 configuration. It's
   really really fast. Kind of expensive but it's like getting 20x the
   speed for 20x the price. I think the big gain is in the 0 seek time.
  
   Here's what I bought.
  
   Crucial RealSSD C300 CTFDDAC256MAG-1G1 2.5 256GB SATA III MLC
   Internal Solid State Drive (SSD)
   http://www.newegg.com/Product/Product.aspx?Item=N82E16820148349
  
      *   2.5
      *   256GB
      *   SATA III
  
      * *Sequential Access - Read:* 355MB/sec (SATA 6Gb/s) 265MB/sec (SATA
        3Gb/s)
      * *Sequential Access - Write:* 215MB/sec (SATA 6Gb/s) 215MB/sec
        (SATA 3Gb/s)
      * *Power Consumption (Active):* 2.1W READ, 4.3W WRITE
      * *Power Consumption (Idle):* 0.094W
  
   Running it on an Asus motherboard that supports SATA III - 6 core AMD
   CPU and 16 gigs of ram. Might be slightly off topic but this server
   screams!

Hey Marc,

Just for testing purposes, what does a dd speed test give you?

http://it.toolbox.com/blogs/database-soup/testing-disk-speed-the-dd-test-31069
IMHO, the key part is exceeding the RAM size, but for a closer to
Maildir comparison, a decent file size that exceeds the drive cache is
good too..

Rick

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Marc Perkel




On 1/12/2011 9:58 AM, Rick Romero wrote:

Quoting Marc Perkel m...@perkel.com:

I just replaced my drives for Dovecot using Maildir format with a

 pair of Solid State Drives (SSD) in a raid 0 configuration. It's
 really really fast. Kind of expensive but it's like getting 20x the
 speed for 20x the price. I think the big gain is in the 0 seek time.

 Here's what I bought.

 Crucial RealSSD C300 CTFDDAC256MAG-1G1 2.5 256GB SATA III MLC
 Internal Solid State Drive (SSD)
 http://www.newegg.com/Product/Product.aspx?Item=N82E16820148349

*   2.5
*   256GB
*   SATA III

* *Sequential Access - Read:* 355MB/sec (SATA 6Gb/s) 265MB/sec 
(SATA

  3Gb/s)
* *Sequential Access - Write:* 215MB/sec (SATA 6Gb/s) 215MB/sec
  (SATA 3Gb/s)
* *Power Consumption (Active):* 2.1W READ, 4.3W WRITE
* *Power Consumption (Idle):* 0.094W

 Running it on an Asus motherboard that supports SATA III - 6 core AMD
 CPU and 16 gigs of ram. Might be slightly off topic but this server
 screams!

Hey Marc,

Just for testing purposes, what does a dd speed test give you?

http://it.toolbox.com/blogs/database-soup/testing-disk-speed-the-dd-test-31069 


IMHO, the key part is exceeding the RAM size, but for a closer to
Maildir comparison, a decent file size that exceeds the drive cache is
good too..

Rick



Looks like a good test. Here's my results.

time sh -c dd if=/dev/zero of=ddfile bs=8k count=200  sync

200+0 records in
200+0 records out
1638400 bytes (16 GB) copied, 55.403 s, 296 MB/s

real1m4.738s
user0m0.336s
sys 0m20.199s

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Matt

 I just replaced my drives for Dovecot using Maildir format with a pair of
 Solid State Drives (SSD) in a raid 0 configuration. It's really really fast.
 Kind of expensive but it's like getting 20x the speed for 20x the price. I
 think the big gain is in the 0 seek time.


I thought about doing this on my email server since its troubles are
mostly disk I/O saturation but I was concerned about reliability.
Have heard that after so many read/writes SSD will go bad.  There are
an awful lot of read/writes on an email server.

I will be interested to hear how it stands up for you.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Timo Sirainen

On 12.1.2011, at 21.15, Matt wrote:

 I thought about doing this on my email server since its troubles are
 mostly disk I/O saturation but I was concerned about reliability.
 Have heard that after so many read/writes SSD will go bad.

There's no need to worry about that in any modern drives.

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Stan Hoeppner

Marc Perkel put forth on 1/12/2011 12:18 PM:

 time sh -c dd if=/dev/zero of=ddfile bs=8k count=200  sync
 
 200+0 records in
 200+0 records out
 1638400 bytes (16 GB) copied, 55.403 s, 296 MB/s
 
 real1m4.738s
 user0m0.336s
 sys 0m20.199s

That's a horrible test case for a mail server, especially one using maildir
storage.  Streaming read/write b/w results are meaningless for mail I/O.  You
need a random I/O test such as bonnie++ or iozone to see your IOPS.  We already
know it'll be off the chart compared to mechanical drives though.  Would still
be neat to see the numbers.  There's also the most realistic test, ironically,
given the list on which you asked this question:

http://www.imapwiki.org/Benchmarking

:)

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread David Jonas

On 1/12/11 , Jan 12, 9:53 AM, Marc Perkel wrote:
 I just replaced my drives for Dovecot using Maildir format with a pair
 of Solid State Drives (SSD) in a raid 0 configuration. It's really
 really fast. Kind of expensive but it's like getting 20x the speed for
 20x the price. I think the big gain is in the 0 seek time.

I've been considering getting a pair of SSDs in raid1 for just the
dovecot indexes. The hope would be to minimize the impact of pop3 users
hammering the server. Proposed design is something like 2 drives (ssd or
platter) for OS and logs, 2 ssds for indexes (soft raid1), 12 sata or
sas drives in RAID5 or 6 (hw raid, probably 3ware) for maildirs. The
indexes and mailboxes would be mirrored with drbd. Seems like the best
of both worlds -- fast and lots of storage.

Does anyone run a configuration like this? How does it work for you?

Anyone have any improvements on the design? Suggestions?

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Stan Hoeppner

Matt put forth on 1/12/2011 1:15 PM:

 I thought about doing this on my email server since its troubles are
 mostly disk I/O saturation but I was concerned about reliability.
 Have heard that after so many read/writes SSD will go bad.  There are
 an awful lot of read/writes on an email server.

From:  http://www.storagesearch.com/ssdmyths-endurance.html

As a sanity check - I found some data from Mtron (one of the few SSD oems who
do quote endurance in a way that non specialists can understand). In the data
sheet for their 32G product - which incidentally has 5 million cycles write
endurance - they quote the write endurance for the disk as greater than 85
years assuming 100G / day erase/write cycles - which involves overwriting the
disk 3 times a day.

That was written in 2007.  SSD flash cell life has increased substantially in
the 3-4 year period since.

From a flash cell longevity standpoint, any decent SSD with wear leveling is
going to easily outlive the typical server replacement cycle of 3-5 years, and
far beyond that.  Note that striping two such SSDs (RAID 0) will double the wear
cycle life, and striping 4 SSDs will quadruple it, so that 85 years becomes 340+
years of wear life with a 4 SSD stripe (RAID 0).

Your misgivings about using SSDs are based on obsolete data from many years ago.

-- 
Stan

Re: [Dovecot] SSD drives are really fast running Dovecot

2011-01-12 Thread Stan Hoeppner

David Jonas put forth on 1/12/2011 6:37 PM:

 I've been considering getting a pair of SSDs in raid1 for just the
 dovecot indexes. The hope would be to minimize the impact of pop3 users
 hammering the server. Proposed design is something like 2 drives (ssd or
 platter) for OS and logs, 2 ssds for indexes (soft raid1), 12 sata or
 sas drives in RAID5 or 6 (hw raid, probably 3ware) for maildirs. The
 indexes and mailboxes would be mirrored with drbd. Seems like the best
 of both worlds -- fast and lots of storage.

Let me get this straight.  You're moving indexes to locally attached SSD for
greater performance, and yet, you're going to mirror the indexes and store data
between two such cluster hosts over a low bandwidth, high latency GigE network
connection?  If this is a relatively low volume environment this might work.
But, if the volume is high enough that you're considering SSD for performance,
I'd say using DRBD here might not be a great idea.

 Anyone have any improvements on the design? Suggestions?

Yes.  Go with a cluster filesystem such as OCFS or GFS2 and an inexpensive SAN
storage unit that supports mixed SSD and spinning storage such as the Nexsan
SATABoy with 2GB cache:  http://www.nexsan.com/sataboy.php

Get the single FC controller model, two Qlogic 4Gbit FC PCIe HBAs, one for each
cluster server.  Attach the two servers to the two FC ports on the SATABoy
controller.  Unmask each LUN to both servers.  This enabling the cluster 
filesystem.

Depending on the space requirements of your indexes, put 2 or 4 SSDs in a RAID0
stripe.  RAID1 simply DECREASES the overall life of SSDs.  SSDs don't have the
failure modes of mechanical drives thus RAID'ing them is not necessary.  You
don't duplex your internal PCIe RAID cards do you?  Same failure modes as SSDs.

Occupy the remaining 10 or 12 disk bays with 500GB SATA drives.  Configure them
as RAID10.  RAID5/6 aren't suitable to substantial random write workloads such
as mail and database.  Additionally, rebuild times for parity RAID schemes (5/6)
are up in the many hours, or even days category, and degraded performance of 5/6
is horrible.  RAID10 rebuild times are a couple of hours and RAID10 suffers zero
performance loss when a drive is down.  Additionally, RAID10 can lose HALF the
drives in the array as long as no two are both drives in a mirror pair.  Thus,
with a RAID10 of 10 disks, you could potentially lose 5 drives with no loss in
performance.  The probability of this is rare, but it demonstrates the point.
With a 10 disk RAID 10 of 7.2k SATA drives, you'll have ~800 random read/write
IOPS performance.  That' may seem low, but that's an actual filesystem figure.
The physical IOPS figure is double that, 1600.  Since you'll have your indexes
on 4 SSDs, and the indexes are where the bulk of IMAP IOPS take place (flags),
you'll have over 50,000 random read/write IOPS.

Having both SSD and spinning drives in the same SAN controller eliminates the
high latency low bandwidth link you were going to use with drbd.  It also
eliminates buying twice as many SSDs, PCIe RAID cards, and disks, one set for
each cluster server.  Total cost may end up being similar between the drbd and
SAN based solutions, but you have significant advantages with the SAN solution
beyond those already mentioned, such as using an inexpensive FC switch and
attaching a D2D or tape backup host, installing the cluster filesystem software
on it, and directly backing up the IMAP store while the cluster is online and
running, or snapshooting it after doing a freeze at the VFS layer.

You should be able to acquire the FC SATABoy with all the drives and SSDs
(depending on what size units you choose), plus the two server HBAs for $15-20k
USD, maybe less.  I've not purchased a unit with SSDs yet, only disk, and
they've very reasonably priced compared to pretty much all other SAN arrays on
the market.  Nexsan's SSD pricing might be a little steep compared to Newegg,
but the units they bundle are fully tested and certified with their array
controllers.  The performance is phenomenal for the price, but obviously there
are higher performing units available.

-- 
Stan

81 matches

Mail list logo