Re: Yet another 'duplicate' thread
On 15.Nov 2013, 01:18, Gregor Zattler wrote: Hi Jonas, * Jonas Petong jonas.pet...@web.de [13. Nov. 2013]: On 13.Nov 2013, 13:01, Nathan Stratton Treadway wrote: On Wed, Nov 13, 2013 at 18:50:44 +0100, Jonas Petong wrote: Cameron, you were right, the message id's are the same. From the matter of fact that limiting my Inbox by ~= did not work led me to the conclusion that their IDs have been different. Seems like you've teached me wrong so. What happens when you try to limit by ~= ? (Note that as I understand this limit only works when the sort order is thread. That is, with no limit applied you should be seeing the duplicate messages marked with an = character your mailbox index listing, and then those marked messages will be selected by the ~= filter.) solved... but then, why limit the view? Delete the duplicate message right away: 1) open mailfoder in question 2) switch to threaded view (per default the key binding is ot) 3) delete-pattern ~= (per default the key-binding is D~=RETURN 4) carefully examine if the right messages are flagged with a D 5) expunge the messages via sync-messages (default key-binding in index is $). In fact this is how I did in the end. Thank you anyways, though. Seems like I didn't point this out in detail when writing my first request. Have a nice weekend! Jonas Done. HTH, Gregor -- the basis of a healthy, tidy mind is a big trash basket. [Kurt Tucholsky]
Re: Yet another 'duplicate' thread
On 14.Nov 2013, 10:24, Cameron Simpson wrote: On 13.Nov 2013, 13:01, Nathan Stratton Treadway wrote: (Note that as I understand this limit only works when the sort order is thread. That is, with no limit applied you should be seeing the duplicate messages marked with an = character your mailbox index listing, and then those marked messages will be selected by the ~= filter.) Worth restating. This is something of a mutt annoyance - silent failure. On 13Nov2013 20:38, Jonas Petong jonas.pet...@web.de wrote: Sorry for that one! Cameron, could you explain me anyhow how to use that script you proposed? Or at least which environment to set? Might be of use for further stuck in nowhere problems (even if for no reason as in my case). You all have a great day! Well, the script as supplied is pseudocode (and of course untested), but based around using Python. (If you don't know Python, it is well worth learning.) in fact I was going to learn python anyways for the simple fact that it is the preferred script language to manage a raspberry pi! I'll take your advice for sure then. A fuller (but still totally untested) sketch might look like this: #!/usr/bin/python import sys import email.parser from mailbox import Maildir # get the maildir pathname from the command line mdirpath = sys.argv[1] # open the Maildir M = Maildir(mdirpath) # list holding message information L = [] for key in M.keys(): # open the message file fp = M.get_file(key) # load the headers from this message hdrs = email.parser.Parser().parse(fp, headersonly=True) # speculative: get the filename of the message pathname = fp.name fp.close() # make a tuple with the info we want info = hdrs['date'], hdrs['subject'], hdrs['message-id'], key, pathname L.append(info) # sort the list # because we have date then subject in the tuple, the sort order is date then subject # (then message-id, then key) L = sorted(L) # this last bit could be adapted to move every second message elsewhere for i in range(0, len(L), 2): date, subject, message_id, key, pathname = L[i] fp.close() ... decide what to do... The last loop iterates 0, 2, 4,... up to the largest index in the list L. Pulling every second message like this is very fragile - you needed to be totally sure that you had an exactly duplicated set of messages. Personally, I would be inclined to make a dict instead of a list, mapping message-ids to a list of message paths (or the info tuples). Then you can iterate over the dict and remove or move sideways the second and following messages for each message-id, leaving only the original. I'd also be writing this script to print a report instead of moving/deleting. Then I can examine the output for sanity before hitting the button. If the report went: pathname message-id date subject it would be easy to read the pathnames from a second script to do the actual message removal. Or whatever. Please feel free to ask whatever questions you like. I do a lot of stuff with Maildirs and Python; I replaced procmail with my own mail filing program a year or so ago. the only thing left for me to do is following the good example of Maurice speaking out my regards for this deep-in-detail answer. Thank you so much for your effort! In the way you were explaining those two lines of code makes it easy to understand and, in fact, is a perfect start to learn python. Even if that wasn't my intention in the first place ;-) Thank you, Cameron! cheers, jonas Cheers, -- Cameron Simpson c...@zip.com.au Q: How many user support people does it take to change a light bulb? A: We have an exact copy of the light bulb here and it seems to be working fine. Can you tell me what kind of system you have? -- the basis of a healthy, tidy mind is a big trash basket. [Kurt Tucholsky]
Re: Yet another 'duplicate' thread
Hi Jonas, * Jonas Petong jonas.pet...@web.de [13. Nov. 2013]: On 13.Nov 2013, 13:01, Nathan Stratton Treadway wrote: On Wed, Nov 13, 2013 at 18:50:44 +0100, Jonas Petong wrote: Cameron, you were right, the message id's are the same. From the matter of fact that limiting my Inbox by ~= did not work led me to the conclusion that their IDs have been different. Seems like you've teached me wrong so. What happens when you try to limit by ~= ? (Note that as I understand this limit only works when the sort order is thread. That is, with no limit applied you should be seeing the duplicate messages marked with an = character your mailbox index listing, and then those marked messages will be selected by the ~= filter.) solved... but then, why limit the view? Delete the duplicate message right away: 1) open mailfoder in question 2) switch to threaded view (per default the key binding is ot) 3) delete-pattern ~= (per default the key-binding is D~=RETURN 4) carefully examine if the right messages are flagged with a D 5) expunge the messages via sync-messages (default key-binding in index is $). Done. HTH, Gregor
Re: Yet another 'duplicate' thread
On 13.Nov 2013, 00:48, Ken Moffat wrote: On Tue, Nov 12, 2013 at 07:22:24PM +0100, Jonas Petong wrote: Today I accidentally copied my mails into the same folder where they had been stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies within my inbox. Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command 'limit~=' has no effect in my case and deleting them by hand will take me hours! I know this question has been (unsuccessfully) asked before. Anyhow is there is a way to tag every other mail (literally every nth mail of my inbox-folder) and afterwards delete them? I know something about linux-scripting but unfortunately I have no clue where to start with and even which script-language to use. This close-to-topic approach with 'fdupes' has been released some time ago (http://consolematt.wordpress.com/tag/fdupes/) but in my view it seems way to complicated. As I could recognize from mutts mailing archive, I'm not the only one who has had trouble with it. Therefore I appreciate any hint which drives me into the right direction and helps me solving this. Running Mutt 1.5.21 under Ubuntu Gnome 13.10. (Linux 3.11.0-13-generic). I don't have a script, but I usually view lists without threading, using date/time sent in sender's timezone (%d) - I'm sure that using the local time zone (%D) probably works the same way. On occasion I've had to change which of my upstreams was subscribed to heavy-traffic lists such as lkml, and at other times I've occasionally had mails appearing twice after upstream problems. When needed, it's just a case of looking at the index and deleting every other mail. Tedious, but achievable - particularly for only 1000 mails - I've done more than that in the past ;-) me too, but I thought that was kind of a waste of time if there was a possibility to solve this with a script automatically. Or even better within mutt itself. By the way I'm a bit worried about my 'j' key ;-) I believe the order in which I see mails is governed by index_format [ I haven't looked at this stuff in ages - why break what works for me ]. Mine is: set index_format=%4C %Z %{%b %d} %-15.15n (%?l?%4l%4c?) %s looks pretty much like mine. If you aren't a reckless person, turn off incoming mail and backup the directory or mbox before you try *any* solution. thank you for that one, I mean it! Wouldn't be the first time trying to restore old folders from my external backup drive. Just stored a copy of my ~/Mails :-) ĸen -- das eine Mal als Tragödie, dieses Mal als Farce -- the basis of a healthy, tidy mind is a big trash basket. [Kurt Tucholsky]
Re: Yet another 'duplicate' thread
On 13.Nov 2013, 13:01, Nathan Stratton Treadway wrote: On Wed, Nov 13, 2013 at 18:50:44 +0100, Jonas Petong wrote: Cameron, you were right, the message id's are the same. From the matter of fact that limiting my Inbox by ~= did not work led me to the conclusion that their IDs have been different. Seems like you've teached me wrong so. What happens when you try to limit by ~= ? (Note that as I understand this limit only works when the sort order is thread. That is, with no limit applied you should be seeing the duplicate messages marked with an = character your mailbox index listing, and then those marked messages will be selected by the ~= filter.) solved... this is really a newbies error: not reading the manual properly -.- Sorry for that one! Cameron, could you explain me anyhow how to use that script you proposed? Or at least which environment to set? Might be of use for further stuck in nowhere problems (even if for no reason as in my case). You all have a great day! Nathan -- the basis of a healthy, tidy mind is a big trash basket. [Kurt Tucholsky]
Re: Yet another 'duplicate' thread
Please excuse a numpty interrupting, but could an old procmail recipe be adapted for use here. What I've got I don't understand and it was poached from somewhere or other # Get rid of duplicates :0 Whc: .msgid.lock | formail -D 16384 .msgid.cache :0 a /dev/null Regards Maurice
Re: Yet another 'duplicate' thread
On 13Nov2013 20:20, Maurice McCarthy mansel...@gmail.com wrote: Please excuse a numpty interrupting, but could an old procmail recipe be adapted for use here. What I've got I don't understand and it was poached from somewhere or other # Get rid of duplicates :0 Whc: .msgid.lock | formail -D 16384 .msgid.cache :0 a /dev/null I prefer to do this in mutt using the ~= search (matches messages that are dupes of other messages). It is more visible. FWIW, I used to use the above procmail recipe, before deciding to do it in mutt. The above recipe uses formail to consult a tiny database where it keeps the most recent 16384 message-ids seen. If the current message's message-id is already there it it exits successfully. This is the condition for the actual filing target /dev/null. So: if already seen, file message to /dev/null (discard it). From man formail: −D maxlen idcache Formail will detect if the Message‐ID of the current message has already been seen using an idcache file of approximately maxlen size. If not splitting, it will return success if a duplicate has been found. If splitting, it will not output duplicate messages. If used in conjunction with −r, formail will look at the mail address of the envelope sender instead at the Message‐ID. I think it also adds the new message-id if unseen. I do this in mutt for a few reasons: - this recipe prevents one from refiling a message. Scenario: change filing rules, submit misfiled message to the new rules. Result: message thrown away. - using mutt makes the discard visible. (except that I have an unconditional folder-hook to discard ~= messages on entry anyway now) At least it is per folder and does not prevent me refiling. - I no longer use procmail to file my mail, preferring a tool of my own called mailfiler. Cheers, -- Cameron Simpson c...@zip.com.au Since I've mentioned the subject of geneology, I'll repeat a story I heard about a poor fellow over on airstrip one. Seems he spent the most recent thirty years of his life tracking down his family history. Spent hundreds of pounds, traveled, devoted his life to it. Then, last month, a cousin told him he was adopted. Ahhh, sweet irony. - Tim_Mefford t...@physics.orst.edu
Re: Yet another 'duplicate' thread
On 13.Nov 2013, 13:01, Nathan Stratton Treadway wrote: (Note that as I understand this limit only works when the sort order is thread. That is, with no limit applied you should be seeing the duplicate messages marked with an = character your mailbox index listing, and then those marked messages will be selected by the ~= filter.) Worth restating. This is something of a mutt annoyance - silent failure. On 13Nov2013 20:38, Jonas Petong jonas.pet...@web.de wrote: Sorry for that one! Cameron, could you explain me anyhow how to use that script you proposed? Or at least which environment to set? Might be of use for further stuck in nowhere problems (even if for no reason as in my case). You all have a great day! Well, the script as supplied is pseudocode (and of course untested), but based around using Python. (If you don't know Python, it is well worth learning.) A fuller (but still totally untested) sketch might look like this: #!/usr/bin/python import sys import email.parser from mailbox import Maildir # get the maildir pathname from the command line mdirpath = sys.argv[1] # open the Maildir M = Maildir(mdirpath) # list holding message information L = [] for key in M.keys(): # open the message file fp = M.get_file(key) # load the headers from this message hdrs = email.parser.Parser().parse(fp, headersonly=True) # speculative: get the filename of the message pathname = fp.name fp.close() # make a tuple with the info we want info = hdrs['date'], hdrs['subject'], hdrs['message-id'], key, pathname L.append(info) # sort the list # because we have date then subject in the tuple, the sort order is date then subject # (then message-id, then key) L = sorted(L) # this last bit could be adapted to move every second message elsewhere for i in range(0, len(L), 2): date, subject, message_id, key, pathname = L[i] fp.close() ... decide what to do ... The last loop iterates 0, 2, 4, ... up to the largest index in the list L. Pulling every second message like this is very fragile - you needed to be totally sure that you had an exactly duplicated set of messages. Personally, I would be inclined to make a dict instead of a list, mapping message-ids to a list of message paths (or the info tuples). Then you can iterate over the dict and remove or move sideways the second and following messages for each message-id, leaving only the original. I'd also be writing this script to print a report instead of moving/deleting. Then I can examine the output for sanity before hitting the button. If the report went: pathname message-id date subject it would be easy to read the pathnames from a second script to do the actual message removal. Or whatever. Please feel free to ask whatever questions you like. I do a lot of stuff with Maildirs and Python; I replaced procmail with my own mail filing program a year or so ago. Cheers, -- Cameron Simpson c...@zip.com.au Q: How many user support people does it take to change a light bulb? A: We have an exact copy of the light bulb here and it seems to be working fine. Can you tell me what kind of system you have?
Re: Yet another 'duplicate' thread
Cameron Many thanks indeed for taking the time to write out a detailed explanation! Best Regards Maurice On 13/11/2013, Cameron Simpson c...@zip.com.au wrote: On 13Nov2013 20:20, Maurice McCarthy mansel...@gmail.com wrote: Please excuse a numpty interrupting, but could an old procmail recipe be adapted for use here. What I've got I don't understand and it was poached from somewhere or other # Get rid of duplicates :0 Whc: .msgid.lock | formail -D 16384 .msgid.cache :0 a /dev/null I prefer to do this in mutt using the ~= search (matches messages that are dupes of other messages). It is more visible. FWIW, I used to use the above procmail recipe, before deciding to do it in mutt.
Yet another 'duplicate' thread
Today I accidentally copied my mails into the same folder where they had been stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies within my inbox. Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command 'limit~=' has no effect in my case and deleting them by hand will take me hours! I know this question has been (unsuccessfully) asked before. Anyhow is there is a way to tag every other mail (literally every nth mail of my inbox-folder) and afterwards delete them? I know something about linux-scripting but unfortunately I have no clue where to start with and even which script-language to use. This close-to-topic approach with 'fdupes' has been released some time ago (http://consolematt.wordpress.com/tag/fdupes/) but in my view it seems way to complicated. As I could recognize from mutts mailing archive, I'm not the only one who has had trouble with it. Therefore I appreciate any hint which drives me into the right direction and helps me solving this. Running Mutt 1.5.21 under Ubuntu Gnome 13.10. (Linux 3.11.0-13-generic). cheers, jonas
Re: Yet another 'duplicate' thread
On 2013-11-12 19:22:24 +0100, Jonas Petong wrote: Today I accidentally copied my mails into the same folder where they had been stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies within my inbox. Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command 'limit~=' has no effect in my case and deleting them by hand will take me hours! I know this question has been (unsuccessfully) asked before. Anyhow is there is a way to tag every other mail (literally every nth mail of my inbox-folder) and afterwards delete them? I know something about linux-scripting but unfortunately I have no clue where to start with and even which script-language to use. for every file: read file and put the message-id in a dict in { message-id: [file1, file2..fileN] } order for each key in that dict: delete all filename values except the first It should not be very complicated to write. If nobody else comes up with something, I can possibly it for you after work. pgpfkgvJm0Edy.pgp Description: PGP signature
Re: Yet another 'duplicate' thread
On 13Nov2013 09:06, Chris Down ch...@chrisdown.name wrote: On 2013-11-12 19:22:24 +0100, Jonas Petong wrote: Today I accidentally copied my mails into the same folder where they had been stored before (evil keybinding!!!) and now I'm faced with about a 1000 copies within my inbox. Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command 'limit~=' has no effect in my case and deleting them by hand will take me hours! I know this question has been (unsuccessfully) asked before. Anyhow is there is a way to tag every other mail (literally every nth mail of my inbox-folder) and afterwards delete them? I know something about linux-scripting but unfortunately I have no clue where to start with and even which script-language to use. for every file: read file and put the message-id in a dict in { message-id: [file1, file2..fileN] } order for each key in that dict: delete all filename values except the first It should not be very complicated to write. If nobody else comes up with something, I can possibly it for you after work. Based on Jonas' post: Since those duplicates do not have a unique mail-id, it's hopeless to filter them with mutts integrated duplicate limiting pattern. Command 'limit~=' has no effect I'd infer that the message-id fields are unique. Jonas: _Why_/_how_ did you get duplicate messages with distinct message-ids? Have you verified (by inspecting a pair of duplicate messages) that their Message-ID headers are different? If the message-ids are unqiue for the duplicate messages I would: Move all the messages to a Maildir folder if they are not already so. This lets you deal with each message as a distinct file. Write a script long the lines of Chris Down's suggestion, but collate messages by subject line, and store a tuple of: (message-file-path, Date:-header-value, Message-ID:-header-value) You may then want to compare messages with identical Date: values. Or, if you are truly sure that the folder contains an exact and complete duplicate: load all the filenames, order by Date:-header, iterate over the list (after ordering) and _move_ every second item into another Maildir folder (in case you're wrong). L = [] for each Maildir-file-in-new,cur: load in the message headers and get the Date: header string L.append( (date:-value, subject:-value, maildir-file-path) ) L = sorted(L) for i in range(0, len(L), 2): move the file L[i][1] into another directory Note that you don't need to _parse_ the Date: header; if these are duplicated messages the literal text of the Date: header should be identical for the adjacent messages. HOWEVER, you probably want to ensure either that all the identical date/subject groupings are only pairs, in case of multiple distinct messages with identical dates. Cheers, -- Cameron Simpson c...@zip.com.au If you can't annoy somebody, there's little point in writing. - Kingsley Amis