G. Wade Johnson wrote:
I would probably take a multi-step approach. I would look for a module
on CPAN that reads the maildir format (for example,
Email::Folder::Maildir, which I found from search.cpan.org).
I would use that to match the To and From fields and remove any that I
didn't want.
The best way to find duplicates is probably through the use of a
message digest and a hash. Walk the messages, passing each through
Digest::SHA1 or Digest::MD5 and use the result as the key to a hash.
If it already exists in the hash, delete the message. If not, add it to
the hash.
Admittedly, that's just an outline of an approach, but it should get
you started.
G. Wade
Thanks, Wade. The term "digest" was unfamiliar to me, but I recognize
the concept from the git documentation, as I am in the process of
switching backup from svn to git. And I was unaware of (or had
forgotten about) search.cspan.org.
RLH
_______________________________________________
Houston mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/houston
Website: http://houston.pm.org/