Hi Steffen,
See my responses below,
On 04/06/14 18:52, Steffen Kaiser wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Wed, 4 Jun 2014, Bruce wrote:
Please see my responses below,
I dislike top posting anyway :-)
Yeah, its habit as I deal with people who top post and reply with 'Your
email had nothing in it' if I don't say there is more below.
>Jun 3 11:38:51 brio dovecot: Dovecot postlogin.sh running as
hamish@XXXXX
>(/mnt/spool/keepers/h/XXXXX/hamish) userid = 7053 (7053) - uidlist
= 26624
>Jun 3 11:38:51 brio dovecot: Dovecot for hamish@XXXXX finished,
uidlist now =
>26624
who is user 26624? Is the uid valid at all? If it is invalid, are there
other files owned by this uid? Maybe only one of your NFS server has
this
uid in its /etc/passwd? Is user "hamish" shared to another user
somehow,
either via symlinks, ACLs, ...?
UID 26624 is a valid user ('info' of domain14552) but under a
compleltly different domain name (hamish is under domain25367).
However that user has also not logged in around the time the
ownership was changed. There is no relevance to the two users, except
that they exist on the system and for some reason this issue happened
to the hamish user.
That means: no sharing, right ?
Correct.
The NFS server does not know about the UIDs, it just provides the
numeric IDs which is translated on the dovecot/exim servers by NSS
and Dovecotr using the replicated MySQL database. Additionally both
users have existed for some time and the databases are in sync.
Customers also do not have any access to the file system so there
will be no symlinks in place.
Its also not a single server that we are seeing the issue on, it
maybe one Dovecot server accessing one NFS server. Then the next time
its a different Dovecot server accessing a different NFS server.
Because it's NFS, any server accessing this NFS export may alter the
ownership, because I still do not believe it's a Dovecot IMAP/POP
issue ;-)
But we would see this with any application and file that is on NFS, in
this case it is only ever one particular file, and only happened after
changing to Dovecot (Previous to that we have had this NFS set up in
place for 7 years).
Nothing besides Courier being replaced by Dovecot has changed in the
server setup (although I could be wrong there, but we are going
through one component at a time and until this issue is resolved we
are not moving onto the next), and the only file which is being
modified is a file which only Dovecot maintains.
Could there be Courier-related left-over scripts running?
Its highly unlikely, and why would a courier script want to modify the
ownership of a Dovecot file.
There are hourly backups which do an rsync to another server in case
of hardware failure, there are scripts which move mailboxes between
NFS servers but they show up in logs. Exim has no need to touch a
dovecot controlled file, and when it writes mail into the maildir its
writing as the correct user.
How does exim deliver the messages to the user storage? Via
Dovecot-Deliver or directly. I interprete your sentence, that the
messages are dropped directly.
00
Exim does it directly, once again no need for it to touch a file which
Dovecot creates and maintains.
possibly something we have done in the config (which I posted in my
first email). Is there a reliable way to run a script directly when a
dovecot session starts and finishes so we could output the ownership
before and after which may also help eliminate the session itself.
Your post-login script does exactly that. That leaves the message
delivery, if you do so via Dovecot deliver. LMTP seems to be off.
I thought that's what should happen but looking at timestamps of the
logins after the change, the permission denied always comes after the
postlogin script completes, which means something happens after then.
Jun 3 12:56:31 gettas dovecot: pop3(celia@xxx): Mailbox init failed
top=0 (0 b), retr=0 (0 b), messages=0 (0 b), del=0
Jun 3 12:56:32 gettas dovecot: pop3-login: Login: user=<celia@xxx>,
method=PLAIN, rip=x.x.x.x, lip=x.x.x.x, mpid=19655, TLS,
session=<r2qX++P6gAB6PWyl>
Jun 3 12:56:32 gettas dovecot: Dovecot postlogin.sh running as
celia@xxx/celia) userid = 41410 (41410) - uidlist = 7903
Jun 3 12:56:32 gettas dovecot: Dovecot for celia@xxx finished, uidlist
now = 7903 <---- POSTLOGIN.SH ENDS HERE
Jun 3 12:56:32 gettas dovecot:
pop3(celia@xxx/celia/Maildir/dovecot-uidlist) failed: Permission denied
Jun 3 12:56:32 gettas dovecot:
pop3(celia@xxx/celia/Maildir/dovecot-uidlist) failed: Permission denied
Jun 3 12:56:32 gettas dovecot: pop3(celia@xxx): Error: Couldn't init
INBOX: Internal error occurred. Refer to server log for more
information. [2014-06-03 12:56:32]
Jun 3 12:56:32 gettas dovecot: pop3(celia@xxx): Mailbox init failed
top=0 (0 b), retr=0 (0 b), messages=0 (0 b), del=0
I could put this down to same microsecond and syslog recording in a
different order, but its consistently in this order.
I'm not sure if inotify works via NFS as Peter suggested, but that
would my next idea as well. Let inotify wait for changes on that file,
then fire up a script to capture "ps gaux" and maybe more.
If you have SELinux running, maybe you can trigger a warning there? I
don't know much about SELinux.
We don't have SELinux running, and unfortunately we are talking over
400,000 mailboxes (200,000 domains) which this could happen randomly too
and adding inotify requires a file handle to be opened per file, which
would make the servers unstable.
I have found a piece of code in the Dovecot source which I'm wondering
about, so I am going to compile up a custom version and see if it
continues to happen. Of course with it happening randomly being able to
confirm its still happening is going to take time. So if anyone has any
other ideas where to look I would be appreciative.
Cheers,
Bruce