On Wed, May 8, 2013 at 10:47 PM, Oleg Goldshmidt <[email protected]> wrote:
> > Disclaimer: I am definitely not an expert on the subject matter and I > hardly know what I am talking about (in this case?). Creativity is no > substitute for knowing what you are doing. > > Now let me try and get creative. > > What is your purpose? Just doing something fancy to impress your boss My real purpose, or the official stated purpose ;-) The thing is, we build 400MB artifact multiple times a day. Say 5×200×400=400Gb a year, not sure I have the space. This means I have to maintain the repository (delete old build results, etc). OTOH, if I use dedupe technique, I can keep all build artifacts and forget about it altogether. I'll never ever fill a modern 250Gb disk. > or > truly save space, e.g., if this stuff - everything that gets built - is > backed up? I'll assume the latter. > > [Aside: if it is not backed up, how many versions do you really need to > keep and why is it an issue?] > > 1. I would probably look into using a version control system rather than > a filesystem. > > a) Modern version control systems are often/usually capable of > storing binary diffs between revisions. Frankly, I've never looked > at how git or mercurial do that (probably quite well), but even, > say, SVN should be able to store a binary diff on commit. IIRC SVN > diffs using xdelta or similar. > I suspect they don't work well on gzipped content: Binary file with diff: (fabenv_mac)❯ du -h .git/objects 4.0K .git/objects/08 232K .git/objects/3d 4.0K .git/objects/44 4.0K .git/objects/84 232K .git/objects/d7 4.0K .git/objects/ee 0B .git/objects/info 0B .git/objects/pack 480K .git/objects (fabenv_mac)❯ git gc Counting objects: 6, done. Delta compression using up to 8 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), done. Total 6 (delta 1), reused 0 (delta 0) (fabenv_mac)❯ find .git/objects/ -type f|xargs du -h 4.0K .git/objects//info/packs 4.0K .git/objects//pack/pack-bd546ad638a3a27e16e57298469558cdd5018879.idx 216K .git/objects//pack/pack-bd546ad638a3a27e16e57298469558cdd5018879.pack However when it's gzipped: (fabenv_mac)❯ find .git/objects/ -type f|xargs du -h 4.0K .git/objects//2a/8fc1caff222272cb043bbf18d240c54315f9d0 4.0K .git/objects//4e/71017582e4f46b3641d27084e5cae0c3303974 216K .git/objects//70/81d2b08bc00dff607aea60e9c6fecbc6950b16 216K .git/objects//8e/71116f4a7f89af36051b8b431427c0e88ab741 4.0K .git/objects//92/00e8eaf6093e6cfd07735bc9fe30da4e86db33 4.0K .git/objects//9d/e5e4af60673998992579be40960d65a5b498a3 (fabenv_mac)❯ git gc Counting objects: 6, done. Delta compression using up to 8 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), done. Total 6 (delta 0), reused 0 (delta 0) (fabenv_mac)❯ du -sh .git/objects 440K .git/objects (fabenv_mac)❯ find .git/objects/ -type f|xargs du -h 4.0K .git/objects//info/packs 4.0K .git/objects//pack/pack-5253e59d6e6950fbbf8455310bb32e3004ded6b2.idx 432K .git/objects//pack/pack-5253e59d6e6950fbbf8455310bb32e3004ded6b2.pack Note the total size didn't change when the same two versions of the file (gcc binary with the first byte changed) were gzipp'd. > b) I suppose one can write commit/get (I use this terminology only > because I mentioned SVN, consider it generic) hooks for most > version control systems to tar/untar (and possibly zip/unzip jars) > if you really need something close to what you described. > All your suggestions are basically good, but they mean I have to change the work style of all the team. The main benefit in my suggestion is, that it's completely transparent. I add a single mount command to the directory I already keep my binary files, and that's it. Everything still works as usual, except I never need to worry about deleting anything. BTW Java artifacts have a very easy to set-up and known deployment mechanism (binary repository with a known protocol to keep binary build products, known API for how to get a build product, etc). It's good to keep your work environment as standard as you reasonably can. > 3. I *heard* of lessfs but I have absolutely no idea if it is relevant > (search and check?). > I need to check how it supports gzip. > > 4. MVFS (Multi-Version FileSystem - the underlying technology of > Rational's ClearCase) comes to mind. It's not open source (or cheap). > It is not userspace. It is probably only available as a part of > ClearCase. Just mentioning for completeness. > > If none of the above is even remotely relevant, sorry for the noise. > > -- > Oleg Goldshmidt | [email protected] >
_______________________________________________ Linux-il mailing list [email protected] http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
