On Thu, Aug 15, 2013 at 11:14 PM, Adam Borowski <[email protected]> wrote: > So you're using a filesystem with the capability you want, without actually > using it. The "clone" ioctl does a copy-on-write link, without changes of > semantics involved with hard links.
Sorry, I didn't make it clear that btrfs wasn't a option to me : I'm stuck with ext3 as I'm working with a 2.6.32 kernel. Yet, if there's some real benefit, ext4 or xfs could be considered. On Fri, Aug 16, 2013 at 3:26 AM, Rogério Brito <[email protected]> wrote: > I'm not Steve, but this is *much* easier than the deduplication of blocks... > :) Yes. I think s.d.n has a *very* different usage pattern for dedup than a normal FS. It's mostly aimed at a huge fileset, with very high dedup ratio. And the best part is that writes are *completely* under control IMHO. > If you only need to use this coarse deduplication, then take a look at > rdfind, instead of hardlink. Hardlink compares the files that are > likely to be the same (e.g., same size) byte by byte, while rdfind > uses hashes (md5 or sha1, at your option) to compare the files. Yes, but as I cannot afford having hash collisions, reverting to byte by byte comparison is indeed needed for me. > On Thu, Aug 15, 2013 at 7:35 PM, Stefano Zacchiroli <[email protected]> wrote: >> With how many files have you tried it (max)? > > I have tried it with probably much fewer files than you have (about > 10^6 files only), but the vast majority of the time is spent with a > cold cache with (spinning) hard disks, for both approaches. Same here. I didn't try it with the full install yet, only with 5 nfsroot, and that's about 600k files in 80k dirs. Yet the hardlink part seems quite time-consuming. I thought about using it only on a subset of changed files. On Fri, Aug 16, 2013 at 10:33 AM, Stefano Zacchiroli <[email protected]> wrote: > none of the tools I've looked at seem to do that. I'll probably look > into patching the one I'll end up choosing for that, but if you know of > a similar tool that can use an external hash db, just shout! Actually, I was also thinking about adding an option to "hardlink" for an external db, one that could be reused from successive launches. That would nicely enable feeding hardlink with only a "find -ctime" list of dirs+files to choose from. Also, since all the writes would go through nfsd, I'm thinking about doing the interception in LD_PRELOAD instead of a full-featured FUSE. Finally I'm even currently looking at the internals of backuppc package. It implemented a full compress+dedup entirely in userspace, since all the write access are tightly controlled (as the one in s.d.n might be). -- Steve Schnepp http://blog.pwkf.org/tag/munin _______________________________________________ Debconf-discuss mailing list [email protected] http://lists.debconf.org/mailman/listinfo/debconf-discuss
