On Mon, May 7, 2018 at 3:05 AM George Clemmer <myg...@gmail.com> wrote:
> I just "resynced" my local maildir scratch. I expected all the files to
> be renamed but I figured it would be no biggie to Git. I was a little
> surprised when my Git repo grew from 2.5G to 4.5G :-O


I used a similar setup with Git as backup, and I have too experienced a
"hard" resync which blew my Git repository size.

However it wasn't Git fault, which if it would have encountered just file
renames it would have been happy to re-use the "existing" data and not
incur more storage than before.

The "culprit" here is a header called `X-TUID` which seems to be added by
`isync` for internal purposes.  (Therefore searching the mailing list
archive for `X-TUID` will lead you to other people that stumbled into this
issue.)

Fortunately you can "convince" Git to repack the repository and check for
file rewrites which will save you some space.  But depending on how large
the repository is (i.e. how many files), could take some time and memory.
  (Look in `man git-config` and play with the settings that pertain to
rename detection.)




> So ... this led me to wonder ... Would using a "stable" name based on a
> checksum be a useful improvement? Naturally, since I am a Git addict, I
> am thinking of 'git hash-object' ;-)


Myself would think this would be a lovely idea, however due to the `X-TUID`
header it would be pointless...




However a better discussion would be the following:  how to use `isync` for
archival purposes, including for "de-duping" mail accounts.  The "archival"
is pretty simple:  no matter how many times you re-sync your inbox from
scratch the file names should be consistent -- through hashing.  The
"de-duping" is a little more complicated:  say you have multiple accounts
(personal and for "business") and you forward some of them from one another
(for accessibility);  however you don't want to delete forwarded emails;
  now if you sync all these accounts you'll get the same email multiple
times, and because of different "routing" headers they won't have the same
match.

However if you "split" the message appart -- headers and body -- the
headers might have changed but the body will be identical.  Now if we can
devise a way to write the two things apart, we'll end up with a better
archival solution.  Unfortunately this won't be anymore a standard proper
"maildir";  but fortunately with some FUSE one could re-present this
"archive" as proper maildirs.  Bonus points if one also splits the email
body into multiple MIME-parts, and de-dups those also (just think of a
thread that re-sends the same attachment over and over...)

But perhaps this "archival" use-case is far out of scope of `isync` and a
tool written from scratch with exactly this purpose in mind would be better.

Ciprian.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
isync-devel mailing list
isync-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/isync-devel

Reply via email to