On 13Nov2013 09:06, Chris Down <[email protected]> wrote:
> On 2013-11-12 19:22:24 +0100, Jonas Petong wrote:
> > Today I accidentally copied my mails into the same folder where they had
> > been
> > stored before (evil keybinding!!!) and now I'm faced with about a 1000
> > copies
> > within my inbox. Since those duplicates do not have a unique mail-id, it's
> > hopeless to filter them with mutts integrated duplicate limiting pattern.
> > Command '<limit>~=' has no effect in my case and deleting them by hand
> > will take me hours!
> >
> > I know this question has been (unsuccessfully) asked before. Anyhow is
> > there is
> > a way to tag every other mail (literally every nth mail of my inbox-folder)
> > and
> > afterwards delete them? I know something about linux-scripting but
> > unfortunately
> > I have no clue where to start with and even which script-language to use.
>
> for every file:
> read file and put the message-id in a dict in { message-id: [file1,
> file2..fileN] } order
>
> for each key in that dict:
> delete all filename values except the first
>
> It should not be very complicated to write. If nobody else comes up with
> something, I can possibly it for you after work.
Based on Jonas' post:
Since those duplicates do not have a unique mail-id, it's hopeless
to filter them with mutts integrated duplicate limiting pattern.
Command '<limit>~=' has no effect
I'd infer that the message-id fields are unique.
Jonas:
_Why_/_how_ did you get duplicate messages with distinct message-ids?
Have you verified (by inspecting a pair of duplicate messages) that
their Message-ID headers are different?
If the message-ids are unqiue for the duplicate messages I would:
Move all the messages to a Maildir folder if they are not already so.
This lets you deal with each message as a distinct file.
Write a script long the lines of Chris Down's suggestion, but collate
messages by subject line, and store a tuple of:
(message-file-path, Date:-header-value, Message-ID:-header-value)
You may then want to compare messages with identical Date: values.
Or, if you are truly sure that the folder contains an exact and complete
duplicate:
load all the filenames, order by Date:-header, iterate over the list (after
ordering)
and _move_ every second item into another Maildir folder (in case you're wrong).
L = []
for each Maildir-file-in-new,cur:
load in the message headers and get the Date: header string
L.append( (date:-value, subject:-value, maildir-file-path) )
L = sorted(L)
for i in range(0, len(L), 2):
move the file L[i][1] into another directory
Note that you don't need to _parse_ the Date: header; if these are
duplicated messages the literal text of the Date: header should be
identical for the adjacent messages. HOWEVER, you probably want to
ensure either that all the identical date/subject groupings are
only pairs, in case of multiple distinct messages with identical
dates.
Cheers,
--
Cameron Simpson <[email protected]>
If you can't annoy somebody, there's little point in writing.
- Kingsley Amis