On Thu, 29 Nov 2018, Marc Roos wrote:

When concatenating mbox files like described here
https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end
up with an 'unsorted' mbox file. Is this going to be a problem
esspecially when they are large >2GB's and new emails will be written to
it?

I don't think it will be a problem, but you might have to remove
some headers (like the UUID header?).  However, I think dovecot
ought to be able to cope with it anyways and regenerate the indices.

The email client nicely sorts the message from folder A "foldera 5 last"
as last, but of course the mbox is not like this.
Is there a better solution for merging files?

As noted, the time order gets scrambled -- using your mail reader to
get it back in time order requires sorting, an intensive operation.

It just so happen I've done this recently with a (GNU) awk script that
merges multiple mailboxes into one mailbox, preserving time order.
It assumes that each message starst with a From envelopes header with
sorted timestamps e.g.

        From [email protected]  Thu Nov 25 18:45:37 2018
        From [email protected]  Thu Nov 25 18:45:37 2018 -0400

Your're welcome to use it.  There's probably a more elegant way with
doveadm/dsync.  Using a mail reader to sort the merged mailbox, then
drag/drop/copy everything into a final mailbox could also work.

Joseph Tam <[email protected]>

#!/bin/sh
#
# Merge multiple mbox's into one assuming that each message
# starts with /^From .* {year}$/ and they are sorted by time.
#
#                       -- Joseph Tam <[email protected]>
#

[ x"$*" = x ] && {
        echo "Usage:  $0 mbox-file ..."
        exit 1
}

gawk -v boxes="$*" </dev/null '
        function Tstamp(header) {
                # Format:       Jan 22 21:00:48 2018 -0700
                #               12345678901234567890123456
                l = length(header)
                spec = (substr(header,l-4,1)=="-")? substr(header,l-25,20) : 
substr(header,l-19,20)
                spec = substr(spec,17,4) " " ym[substr(spec,1,3)] 
substr(spec,4,3) \
                         " " substr(spec,8,2) " " substr(spec,11,2) " " 
substr(spec,14,2)
                return int(mktime(spec))

        }

        function DumpMessage(i) {
                if (header[i]!="") {
                        printf("%s\n",header[i])
                }
                while ((getline x <mbox[i])>0) {
                        if (x~/^From .*[0-9][0-9][0-9][0-9]$/) {
                                stamp[i] = Tstamp(x)
                                header[i] = x
                                printf("%s => [%d] %d\n",header[i],i,stamp[i]) 
>"/dev/stderr"
                                return
                        }
                        print x
                }

                printf("EOF[%d]\n",i) >"/dev/stderr"
                stamp[i] = 2147483647
                header[i] = ""
        }

        BEGIN {
                ym["Jan"] = "01"; ym["Feb"] = "02"; ym["Mar"] = "03"; ym["Apr"] = 
"04"
                ym["May"] = "05"; ym["Jun"] = "06"; ym["Jul"] = "07"; ym["Aug"] = 
"08"
                ym["Sep"] = "09"; ym["Oct"] = "10"; ym["Nov"] = "11"; ym["Dec"] = 
"12"

                n = split(boxes,mbox," ")

                # Read first header line from all boxes
                for (i=1; i<=n; i++) {
                         DumpMessage(i)
                }

                # Loop until all maiboxes read
                while (1) {
                        t = 2147483646

                        # Find next message
                        for (i=1; i<=n; i++) {
                                if (stamp[i]<=t) {t=stamp[i]; j=i;}
                        }

                        # If no more message, quit
                        if (t==2147483646) exit

                        # Dump next message from mbox[j]
                        DumpMessage(j)
                }
        }'

Reply via email to