aside from cat?

On Thu, Nov 29, 2018 at 03:07:58PM -0800, Joseph Tam wrote:
> On Thu, 29 Nov 2018, Marc Roos wrote:
> 
> >When concatenating mbox files like described here
> >https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end
> >up with an 'unsorted' mbox file. Is this going to be a problem
> >esspecially when they are large >2GB's and new emails will be written to
> >it?
> 
> I don't think it will be a problem, but you might have to remove
> some headers (like the UUID header?).  However, I think dovecot
> ought to be able to cope with it anyways and regenerate the indices.
> 
> >The email client nicely sorts the message from folder A "foldera 5 last"
> >as last, but of course the mbox is not like this.
> >Is there a better solution for merging files?
> 
> As noted, the time order gets scrambled -- using your mail reader to
> get it back in time order requires sorting, an intensive operation.
> 
> It just so happen I've done this recently with a (GNU) awk script that
> merges multiple mailboxes into one mailbox, preserving time order.
> It assumes that each message starst with a From envelopes header with
> sorted timestamps e.g.
> 
>       From [email protected]  Thu Nov 25 18:45:37 2018
>       From [email protected]  Thu Nov 25 18:45:37 2018 -0400
> 
> Your're welcome to use it.  There's probably a more elegant way with
> doveadm/dsync.  Using a mail reader to sort the merged mailbox, then
> drag/drop/copy everything into a final mailbox could also work.
> 
> Joseph Tam <[email protected]>
> 
> #!/bin/sh
> #
> # Merge multiple mbox's into one assuming that each message
> # starts with /^From .* {year}$/ and they are sorted by time.
> #
> #                     -- Joseph Tam <[email protected]>
> #
> 
> [ x"$*" = x ] && {
>       echo "Usage:  $0 mbox-file ..."
>       exit 1
> }
> 
> gawk -v boxes="$*" </dev/null '
>       function Tstamp(header) {
>               # Format:       Jan 22 21:00:48 2018 -0700
>               #               12345678901234567890123456
>               l = length(header)
>               spec = (substr(header,l-4,1)=="-")? substr(header,l-25,20) : 
> substr(header,l-19,20)
>               spec = substr(spec,17,4) " " ym[substr(spec,1,3)] 
> substr(spec,4,3) \
>                        " " substr(spec,8,2) " " substr(spec,11,2) " " 
> substr(spec,14,2)
>               return int(mktime(spec))
> 
>       }
> 
>       function DumpMessage(i) {
>               if (header[i]!="") {
>                       printf("%s\n",header[i])
>               }
>               while ((getline x <mbox[i])>0) {
>                       if (x~/^From .*[0-9][0-9][0-9][0-9]$/) {
>                               stamp[i] = Tstamp(x)
>                               header[i] = x
>                               printf("%s => [%d] %d\n",header[i],i,stamp[i]) 
> >"/dev/stderr"
>                               return
>                       }
>                       print x
>               }
> 
>               printf("EOF[%d]\n",i) >"/dev/stderr"
>               stamp[i] = 2147483647
>               header[i] = ""
>       }
> 
>       BEGIN {
>                 ym["Jan"] = "01"; ym["Feb"] = "02"; ym["Mar"] = "03"; 
> ym["Apr"] = "04"
>                 ym["May"] = "05"; ym["Jun"] = "06"; ym["Jul"] = "07"; 
> ym["Aug"] = "08"
>                 ym["Sep"] = "09"; ym["Oct"] = "10"; ym["Nov"] = "11"; 
> ym["Dec"] = "12"
> 
>               n = split(boxes,mbox," ")
> 
>               # Read first header line from all boxes
>               for (i=1; i<=n; i++) {
>                        DumpMessage(i)
>               }
> 
>               # Loop until all maiboxes read
>               while (1) {
>                       t = 2147483646
> 
>                       # Find next message
>                       for (i=1; i<=n; i++) {
>                               if (stamp[i]<=t) {t=stamp[i]; j=i;}
>                       }
> 
>                       # If no more message, quit
>                       if (t==2147483646) exit
> 
>                       # Dump next message from mbox[j]
>                       DumpMessage(j)
>               }
>       }'

-- 
So many immigrant groups have swept through our town
that Brooklyn, like Atlantis, reaches mythological
proportions in the mind of the world - RI Safir 1998
http://www.mrbrklyn.com 

DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002
http://www.nylxs.com - Leadership Development in Free Software
http://www2.mrbrklyn.com/resources - Unpublished Archive 
http://www.coinhangout.com - coins!
http://www.brooklyn-living.com 

Being so tracked is for FARM ANIMALS and and extermination camps, 
but incompatible with living as a free human being. -RI Safir 2013

Reply via email to