Re: Deduplication ?

Jani Nikula Mon, 02 Jun 2014 10:07:17 -0700

On Mon, 02 Jun 2014, Mark Walters <[email protected]> wrote:
> Tomi Ollila <[email protected]> writes:
>
>> On Mon, Jun 02 2014, Mark Walters <[email protected]> wrote:
>>
>>> Vladimir Marek <[email protected]> writes:
>>> If you want to save disk space then you could delete the duplicates
>>> after with something like
>>>
>>> notmuch search --output=files --format=text0 --duplicate=2 '*' piped to
>>> xargs -0
>>
>> What if there are 3 duplicates (or 4... ;)
>
> I was assuming that it was merging 2 duplicate-free bunches of messages,
> but I guess the new 100000 might not be. In that case running the above
> repeatedly (ie until it is a no-op) would be fine.


With 'notmuch new' in between the runs, obviously.

Alternatively, find the biggest --duplicate=N which still outputs
something, and run the command for each N...2.


>> One should also have some message content heuristics to determine that the
>> content is indeed duplicate and not something totally different (not that
>> we can see the different content anyway... but...)
>
> That would be nice.

And quite hard.


BR,
Jani.

_______________________________________________
notmuch mailing list
[email protected]
http://notmuchmail.org/mailman/listinfo/notmuch

Re: Deduplication ?

Reply via email to