> > I want to import bigger chunk of archived messages into my notmuch > > database. It's about 100k messages. The problem is, that I most probably > > have quite a lot of those messages in the DB. Basically I would like to > > add only those I don't have already. > > > > There are two possibilities > > > > a) I will add all the 100k messages and then remove the duplicities. > > > > b) I will write a script which will parse the message ID's of the > > to-be-added messages and try to match them to the notmuch DB. Adding > > only files I can't find already. > > > > Ad b) might be better option, but I started to play with the idea of > > deduplication. I'm thinking about listing all the message IDs stored in > > DB, listing all files belonging to the IDs and deleting all but one. > > Also I'm thinking about implementing some simple algorithm telling me > > whether the messages are really very similar. Just to be sure I don't > > delete something I don't want to. > > > > Was anyone playing with the idea? > > notsync[1] used the (lack of) existence of a message id in the store to > decide whether to add something from an IMAP server, but it is old, > crufty, unused and unloved code.
I see, that's close to my b) solution, thanks! -- Vlad