Re: [Dovecot] NFS random redirects

2009-10-22 Thread Thomas Hummel
On Wed, Oct 21, 2009 at 04:59:50PM +0100, Guy wrote:

 Our current setup uses two NFS mounts accessed simultaneously by two
 servers.

[...]

Thanks for sharing your experience.
Are you using mbox, dbox or maildir ?
What % of IMAP and POP3 clients ?

-- 
Thomas Hummel   | Institut Pasteur
hum...@pasteur.fr | Pôle informatique - systèmes et réseau


Re: [Dovecot] NFS random redirects

2009-10-22 Thread Thomas Hummel
On Wed, Oct 21, 2009 at 09:39:22AM -0700, Brandon Davidson wrote:

 As a contrasting data point, we run NFS + random redirects with almost no
 problems. 

Thanks for your answer as well.

What mailbox format are you using ?

-- 
Thomas Hummel   | Institut Pasteur
hum...@pasteur.fr | Pôle informatique - systèmes et réseau


Re: [Dovecot] NFS random redirects

2009-10-22 Thread Brandon Davidson
Thomas,

On 10/22/09 1:29 AM, Thomas Hummel hum...@pasteur.fr wrote:
 On Wed, Oct 21, 2009 at 09:39:22AM -0700, Brandon Davidson wrote:
 As a contrasting data point, we run NFS + random redirects with almost no
 problems. 
 
 Thanks for your answer as well.
 
 What mailbox format are you using ?

We switched to Maildir a while back due to performance issues with mbox,
primarily centered around locking and the cost of rewriting the entire file
when one message changes. Haven't looked back since.

Our config is pretty vanilla - users in LDAP (via pam_ldap), standard UNIX
home directory layout, Sendmail on the MTA hosts.

-Brad 



Re: [Dovecot] NFS random redirects

2009-10-21 Thread Thomas Hummel
On Tue, Oct 20, 2009 at 10:47:25AM +0200, Thomas Hummel wrote:

  Actual mail content should be safe. 

So you seem to say that indexes files would probably get corrupted but that
clients wouldn't notice it ?

I'm trying to figure out how to use imap-test test script-ing to test this.
Any suggestion what the test file should look like ?

Thanks.

-- 
Thomas Hummel   | Institut Pasteur
hum...@pasteur.fr | Pôle informatique - systèmes et réseau


Re: [Dovecot] NFS random redirects

2009-10-21 Thread Timo Sirainen

On Oct 20, 2009, at 4:47 AM, Thomas Hummel wrote:

If you do it, you'll most likely see some random index related  
errors.


But are index related errors recoverable (does dovecot notice and  
fix it

dynamically ?) or will they cause client-side corruption ?

How bad would that corruption be ? (like fetching wrong message,  
since index
store nextuid as well ? setting wrong flag ? ...) and how could a  
client fix it ?


It's unlikely that anything bad happens, but who knows. Random  
unnoticed corruption can do pretty much anything.


caches.. So I've added some highly OS-specific code that works most  
of

the time, but not perfectly. It works best with Linux.


Ouch! I run dovecot on FreeBSD ;-(


FreeBSD's NFS seems to be among the worst..


imaptest exists now in http://imapwiki.org/ImapTest


Thanks. Sorry I didn't find it. Do you have any suggestion to test  
specifically

nfs corruption chances ?


Not really. Just running the stress test on the same mailbox in 2+ NFS  
clients should start showing the problems somewhat soon if they are any.


Actual mail content should be safe. Unless you just happen to get  
such

a cache file corruption that Dovecot doesn't notice it and sends some
broken headers to IMAP client.


If that happened, with Maildir, the actual content on disk on server  
wouldn't

be corrupted I guess,


Right.


so I would that be recoverable on the client ?


If anything else fails, clearing client caches (or recreating the  
account on client side) should work, yes.


Re: [Dovecot] NFS random redirects

2009-10-21 Thread Guy
2009/10/21 Timo Sirainen t...@iki.fi

 On Oct 20, 2009, at 4:47 AM, Thomas Hummel wrote:

  If you do it, you'll most likely see some random index related errors.


 But are index related errors recoverable (does dovecot notice and fix it
 dynamically ?) or will they cause client-side corruption ?

 How bad would that corruption be ? (like fetching wrong message, since
 index
 store nextuid as well ? setting wrong flag ? ...) and how could a client
 fix it ?


 It's unlikely that anything bad happens, but who knows. Random unnoticed
 corruption can do pretty much anything.


Our current setup uses two NFS mounts accessed simultaneously by two
servers. Our load balancing tries to keep a user on the same server whenever
possible. Initially we just had roundrobin load balancing which led to index
corruption.
The problems we've had with that corruption have simply been that some
messages are displayed twice or not displayed at all in mail clients.
Deletion of the corrupted index allowed Dovecot to recreate it correctly, so
the client can't do anything about it. You'd probably have to do it manually
or have some sort of web interface for users to do it themselves.

I certainly wouldn't use NFS with multiple servers accessing it again for
Dovecot. Looking at a clustered FS on SAN solution at the moment.

Cheers
Guy

-- 
Don't just do something...sit there!


Re: [Dovecot] NFS random redirects

2009-10-21 Thread Brandon Davidson
On 10/21/09 8:59 AM, Guy wyldf...@gmail.com wrote:
 Our current setup uses two NFS mounts accessed simultaneously by two
 servers. Our load balancing tries to keep a user on the same server whenever
 possible. Initially we just had roundrobin load balancing which led to index
 corruption.
 The problems we've had with that corruption have simply been that some
 messages are displayed twice or not displayed at all in mail clients.
 Deletion of the corrupted index allowed Dovecot to recreate it correctly, so
 the client can't do anything about it. You'd probably have to do it manually
 or have some sort of web interface for users to do it themselves.
 
 I certainly wouldn't use NFS with multiple servers accessing it again for
 Dovecot. Looking at a clustered FS on SAN solution at the moment.

As a contrasting data point, we run NFS + random redirects with almost no
problems. We host ~7TB of mail for ~45k users with a peak connection count
of 10k IMAP connections, and maybe a handful of POP3. We make absolutely no
effort to make sure that connections from the same user or IP are routed to
the same server.

We do occasionally see index corruption, but it is almost always related to
the user going over quota, and Dovecot being unable to write to the logs. If
we wanted to solve this problem, we could move the indexes off to a second
tier of storage. It is a very minor issue though. Locking has not been a
problem at all.

I will say that this may be a situation where you get what you pay for.
We've invested a fair amount of money in our storage system (Netapp), server
pool (RHEL5), and networking technology (F5 BigIP LTM). Our mail is spread
across 16 volumes on two filers, and we are careful to stress-test the
servers and storage backend before rolling out major upgrades.

That is not of course to neglect the value of things that are free - like
Dovecot! Many thanks to Timo for maintaining such a wonderful piece of
software!

-Brad



Re: [Dovecot] NFS random redirects

2009-10-20 Thread Thomas Hummel
On Mon, Oct 19, 2009 at 12:42:08PM -0400, Timo Sirainen wrote:

Thanks for the answers Timo,

I understand random redirect is not a good idea but I'm trying to evaluate the
damage it can do.

 If you do it, you'll most likely see some random index related errors.

But are index related errors recoverable (does dovecot notice and fix it
dynamically ?) or will they cause client-side corruption ? 

How bad would that corruption be ? (like fetching wrong message, since index
store nextuid as well ? setting wrong flag ? ...) and how could a client fix it 
?

 caches.. So I've added some highly OS-specific code that works most of  
 the time, but not perfectly. It works best with Linux.

Ouch! I run dovecot on FreeBSD ;-(

 imaptest exists now in http://imapwiki.org/ImapTest

Thanks. Sorry I didn't find it. Do you have any suggestion to test specifically
nfs corruption chances ?

 Actual mail content should be safe. Unless you just happen to get such  
 a cache file corruption that Dovecot doesn't notice it and sends some  
 broken headers to IMAP client.

If that happened, with Maildir, the actual content on disk on server wouldn't
be corrupted I guess, so I would that be recoverable on the client ?

 The long term fix for this is 
 http://www.dovecot.org/list/dovecot/2009-August/041983.html

Thanks.

-- 
Thomas Hummel   | Institut Pasteur
hum...@pasteur.fr | Pôle informatique - systèmes et réseau


[Dovecot] NFS random redirects

2009-10-19 Thread Thomas Hummel
Hello,

Dovecot documentation states that the random redirects to multiple servers
NFS solution is to be avoided and I'm investigation the actual risks of it and
a way to put it to test.

I'm running dovecot-1.2.6 with Maildir (indexes, mailboxes and control files
are all on NFS) and I'm using procmail instead of deliver as the LDA.

  1. Documentation says : Dovecot locks the maildir while doing modifications
 to it or while looking for new messages in it and then mention the
 dovecot-uidlist.lock dotlock file.

  a) if that file fcntl'ed in addition (i.e. dotlocked + fcntled or just 
dotlocked) ?

  b) is that file THE way to lock the Maildir mentionned above or is it
just something else used only for updating that particular file (i.e. is
Maildir locked in some way + dovecot-uidlist.lock created or just
dovecot-uidlist.lock created) ?

  2. Documentation says : NFS caching is a big problem when multiple computers
 are accessing the same mailbox simultaneously

 I guess it's because of the dotlocks files and not fcntl locking ? Is
 there anything else than dovecot-uidlist which is dotlocked ? If yes, is it
 dotlocked only or dotlocked and fcntled ?

  3. Documentation says : Dovecot v1.1 flushes NFS caches when needed if you 
set mail_nfs_storage=yes

 How can a program flush the NFS caches ? By which (system) call ?

  4. Documentation says : Besides the NFS cache problems described above,
 mailbox contents can't be cached as well in the memory either.

Is it about in-memory indexes or part of indexes loaded into memory ?

  5. How can I torture-test concurrent access to the same mailbox through 2
 dovecot servers ? I don't see imaptest.c anymore on 
http://dovecot.org/tools
 but I see 2 nfs_test ? Is there some command line options I should use ?

  6. when and why can a random redirect to multiple server can cause mailbox
 corruption ? On flags only or on content as well ?

Thanks

-- 
Thomas Hummel   | Institut Pasteur
hum...@pasteur.fr | Pôle informatique - systèmes et réseau


Re: [Dovecot] NFS random redirects

2009-10-19 Thread Timo Sirainen

On Oct 19, 2009, at 9:35 AM, Thomas Hummel wrote:

Dovecot documentation states that the random redirects to multiple  
servers
NFS solution is to be avoided and I'm investigation the actual risks  
of it and

a way to put it to test.


If you do it, you'll most likely see some random index related errors.

 1. Documentation says : Dovecot locks the maildir while doing  
modifications
to it or while looking for new messages in it and then mention  
the

dovecot-uidlist.lock dotlock file.

 a) if that file fcntl'ed in addition (i.e. dotlocked + fcntled  
or just dotlocked) ?


No fcntl.

 b) is that file THE way to lock the Maildir mentionned above or  
is it
   just something else used only for updating that particular  
file (i.e. is
   Maildir locked in some way + dovecot-uidlist.lock created or  
just

   dovecot-uidlist.lock created) ?


Maildir is then locked for Dovecot against doing any modifications.  
Procmail will ignore it, and that's fine.


 2. Documentation says : NFS caching is a big problem when multiple  
computers

are accessing the same mailbox simultaneously

I guess it's because of the dotlocks files and not fcntl locking ?


No, it's because of NFS caching.. Like if Dovecot writes to  
dovecot.index.log first and then dovecot.index, it expects that it's  
not possible that a later read sees the dovecot.index change, but not  
the dovecot.index.log change. But because of NFS caching this is  
possible, and corrupted index files errors will occur. There are  
many other similar issues.


 3. Documentation says : Dovecot v1.1 flushes NFS caches when  
needed if you set mail_nfs_storage=yes


How can a program flush the NFS caches ? By which (system) call ?


That's the main problem. There is no reliable way to flush NFS  
caches.. So I've added some highly OS-specific code that works most of  
the time, but not perfectly. It works best with Linux.


 4. Documentation says : Besides the NFS cache problems described  
above,

mailbox contents can't be cached as well in the memory either.

   Is it about in-memory indexes or part of indexes loaded into  
memory ?


I was thinking about OS's filesystem caching in general, so all mail  
files and such.


 5. How can I torture-test concurrent access to the same mailbox  
through 2

dovecot servers ? I don't see imaptest.c anymore on http://dovecot.org/tools
but I see 2 nfs_test ? Is there some command line options I  
should use ?


imaptest exists now in http://imapwiki.org/ImapTest

 6. when and why can a random redirect to multiple server can cause  
mailbox

corruption ?


Pretty much randomly..


On flags only or on content as well ?


Actual mail content should be safe. Unless you just happen to get such  
a cache file corruption that Dovecot doesn't notice it and sends some  
broken headers to IMAP client.


The long term fix for this is 
http://www.dovecot.org/list/dovecot/2009-August/041983.html