On Mon, May 7, 2018 at 3:05 AM George Clemmer <myg...@gmail.com> wrote: > I just "resynced" my local maildir scratch. I expected all the files to > be renamed but I figured it would be no biggie to Git. I was a little > surprised when my Git repo grew from 2.5G to 4.5G :-O
I used a similar setup with Git as backup, and I have too experienced a "hard" resync which blew my Git repository size. However it wasn't Git fault, which if it would have encountered just file renames it would have been happy to re-use the "existing" data and not incur more storage than before. The "culprit" here is a header called `X-TUID` which seems to be added by `isync` for internal purposes. (Therefore searching the mailing list archive for `X-TUID` will lead you to other people that stumbled into this issue.) Fortunately you can "convince" Git to repack the repository and check for file rewrites which will save you some space. But depending on how large the repository is (i.e. how many files), could take some time and memory. (Look in `man git-config` and play with the settings that pertain to rename detection.) > So ... this led me to wonder ... Would using a "stable" name based on a > checksum be a useful improvement? Naturally, since I am a Git addict, I > am thinking of 'git hash-object' ;-) Myself would think this would be a lovely idea, however due to the `X-TUID` header it would be pointless... However a better discussion would be the following: how to use `isync` for archival purposes, including for "de-duping" mail accounts. The "archival" is pretty simple: no matter how many times you re-sync your inbox from scratch the file names should be consistent -- through hashing. The "de-duping" is a little more complicated: say you have multiple accounts (personal and for "business") and you forward some of them from one another (for accessibility); however you don't want to delete forwarded emails; now if you sync all these accounts you'll get the same email multiple times, and because of different "routing" headers they won't have the same match. However if you "split" the message appart -- headers and body -- the headers might have changed but the body will be identical. Now if we can devise a way to write the two things apart, we'll end up with a better archival solution. Unfortunately this won't be anymore a standard proper "maildir"; but fortunately with some FUSE one could re-present this "archive" as proper maildirs. Bonus points if one also splits the email body into multiple MIME-parts, and de-dups those also (just think of a thread that re-sends the same attachment over and over...) But perhaps this "archival" use-case is far out of scope of `isync` and a tool written from scratch with exactly this purpose in mind would be better. Ciprian. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ isync-devel mailing list isync-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/isync-devel