Yes, file 148 is completely corrupted.

On 5/12/17 9:16 AM, Bron Gondwana wrote:
Just read the 148 file and see if there is corruption.  I suspect that's the 
cause.

Bron

On Fri, 12 May 2017, at 23:09, Eric Cunningham wrote:
I have snapshot backups of that account, that particular message and the
various cyrus db files.  What can I provide for review?


On 5/11/17 7:38 PM, Bron Gondwana wrote:
It looked like the '148.' file was corrupted in some way.  I assume you've 
deleted all the evidence now, so we can't see how!  Oh well :(

Bron.

On Fri, 12 May 2017, at 04:55, Eric Cunningham wrote:
Just to close the loop on this, once the corrupted cdm-lit account on my
copy server successfully reconstructed, the backlog of replication has
completed successfully and is now caught up.

-Eric

On 05/11/2017 02:20 PM, Eric Cunningham wrote:
-G didn't seem to help, but I tried a "reconstruct -f -r" command
without having previously deleted all occurrences of the cyrus.* files
and it was then successful.



On 05/11/2017 01:44 PM, Patrick Boutilier wrote:
Try a -G with reconstruct?

-G     Force  re-parsing  of the underlying message (checks GUID
correctness).  Reconstruct with -G should fix all possible individual
message issues, including corrupted data files.


On 05/11/2017 02:37 PM, Eric Cunningham wrote:
I have to walk this back.  In looking slightly further back in my
logfiles, before every instance of a failure to sync some folder, I
see a common error reported prior to every "bailing out! Bad
protocol" error - "IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."

May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update
failed: user.cdm-lit.Sent 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to UNMAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed:
user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing
out! Bad protocol
May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol

When I recontructed this cdm-lit account, it ran successfully on my
master host, but fails on my copy host:

[cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit
user.cdm-lit
user.cdm-lit.Sea Trail 2013
user.cdm-lit.Sent uid 1 found - adding
user.cdm-lit.Sent uid 2 found - adding
user.cdm-lit.Sent uid 3 found - adding
user.cdm-lit.Sent uid 4 found - adding
....
user.cdm-lit.Sent uid 146 found - adding
user.cdm-lit.Sent uid 147 found - adding
user.cdm-lit.Sent uid 148 found - adding
fatal error: Internal error: assertion failed: imap/mailbox.c: 2847:
record->size

Since I couldn't get a successful reconstruct on this account, I
deleted it and recreated it from my master host.  However, I'm still
unable to get a successful reconstruct with "failed to read index
header" for every subfolder and "fatal error: Internal error:
assertion failed: imap/mailbox.c: 2847: record->size"

Any ideas on how to correct this so I can see if I can then get past
the replication error?

Thanks!

-Eric


On 05/11/2017 11:46 AM, Eric Cunningham wrote:
Thanks Bron, but that doesn't seem to work for me, unless I'm
missing something.

I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


[cyrus@imap2 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


I then restart the replication against the log-run file and it again
fails on a folder that no longer exists in that account:

...
May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length
response to UNMAILBOX (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: folder_delete(): failed:
user.scramer.Trash.IS Software Management 'Bad protocol'
May 11 11:40:51 imap1 sync_client[60696]: Error in do_sync():
bailing out! Bad protocol
May 11 11:40:51 imap1 sync_client[60696]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol


-Eric


On 05/11/2017 02:14 AM, Bron Gondwana wrote:
Looks like you have a corrupted mailboxes database - if you run a
reconstruct at each end then it should be fine.  Replication won't
bail for an actually-doesn't-exist mailbox, but it can't fix broken
mailboxes, that's what reconstruct is for.

Bron.


On Thu, 11 May 2017, at 01:01, Eric Cunningham wrote:
Hi list, I'm running cyrus replication on cyrus 2.5.10.  It seems
whenever a folder is encountered in log-run that doesn't exist on the
client and/or the server, the replication process logs "Error in
do_sync(): bailing out! Bad protocol".  Sometimes, it dies
completely,
other times it restarts at the beginning of the log-run file.  But
when
it re-encounters the folder in question, the same failure occurs
and the
problem continuously loops until some manual action is taken.  I
have a
checker script that attempts to restart a stopped sync but I haven't
tried to code it to get past these types of folder errors.  In the
meantime, replication effectively stops and the replication log
continuously builds in size.  You might imagine my horror at
discovering
yesterday that replication of my production server with 1200 accounts
had been failing for over 7 weeks unnoticed, generating 9.5 million
lines in the replication log files.  I had gotten out of the habit of
daily monitoring of replication as it had been running fine and I
eventually moved on to other things.  I'm trying to get
replication to
run through the enormous backlog but am running into more similar
folder
issues that cause the process to fail, I make manual fixes, and start
the process again.  In my experience with this, reconstruction of the
affected folders on the master host and copy host has no effect.

Can this otherwise excellent feature be made more robust to work past
these folder issues?

Here are a couple of examples I've observed thus far of how the
process
is failing for me.

--------------

In this example, the subfolder "Fly America Foreign Travel" no longer
existed on the master host but did exist on the copy host.  To
resolve
this, I had to manually delete that folder from the copy host.

May  9 15:17:48 imap1 sync_client[34614]: inefficient replication
(8 >
5) user.jtaft.PO.Fly America Foreign Travel
May  9 15:17:48 imap1 sync_client[34614]: FULLMAILBOX received NO
response: IMAP_MAILBOX_NONEXISTENT No Such Mailbox
May  9 15:17:48 imap1 sync_client[34614]: do_folders(): update
failed:
user.jtaft.PO.Fly America Foreign Travel 'Mailbox does not exist'
May  9 15:17:48 imap1 sync_client[34614]: IOERROR: Mailbox does
not exist
May  9 15:17:48 imap1 sync_client[34614]: Error in do_sync(): bailing
out! Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Mailbox does
not exist
May  9 15:17:48 imap1 sync_client[34614]: Reprocessing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run

--------------

In this example, the subfolder "Sohn" did not exist on the master
host
nor on the copy host.  To resolve, I had to edit log-run and
completely
remove all entries for this "Sohn" subfolder.

May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
response
to MAILBOXES (end of file reached)
May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
response
to MAILBOXES (end of file reached)
May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length
response
to UNMAILBOX (end of file reached)
May 10 10:13:26 imap1 sync_client[47011]: folder_delete(): failed:
user.gg-chair.Trash.Sohn 'Bad protocol'
May 10 10:13:26 imap1 sync_client[47011]: Error in do_sync(): bailing
out! Bad protocol
May 10 10:13:26 imap1 sync_client[47011]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol

--------------

Often, errors are logged without "bailing out!" and replication
continues to run:

May 10 10:45:00 imap1 sync_client[87438]: sync_mailboxes: doing 1000
May 10 10:45:01 imap1 sync_client[87438]: sync_mailboxes: doing 1000
May 10 10:45:24 imap1 sync_client[87438]: IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 10 10:45:37 imap1 sync_client[87438]: IOERROR: zero length
response
to MAILBOX (end of file reached)
May 10 10:45:37 imap1 sync_client[87438]: do_folders(): update
failed:
user.cdm-lit.Sent 'Bad protocol'
May 10 10:45:37 imap1 sync_client[87438]: sync_mailboxes: doing 1000

--------------

Thank you, list!

-Eric

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus




----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus



----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus



----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus






----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to