Le 15 août 2013 01:08, "Adam Borowski" <[email protected]> a écrit :
First, some context : I'm trying to efficiently store a huge nfs-root farm (objective: 500+), therefore the perf impact should be very limited. I don't bother dedup inside files. I'm only aiming at deduping whole files. > There are two ways: I'm exploring a third way, entirely userspace (I'll draft a mixed one at the end). For that I'm using the "hardlink" package. It does operate on the file level, in userspace. It has some serious drawbacks : * bugged on a race condition when changing files * once hardlinked, one cannot write in the file anymore. It has to be written as a new one then replaced using rename(2). Despite these it has huge benefits : * Nil performance impact. * Asynchronous (offline) deduplication can be scheduled off peak hours. * usable in old-stable kernels > * a nice and clean way. The kernel interface would need to be "hey kernel, > I think the block in fd 1 offset 0 might be same as a block in fd 2 offset > 4096, care to compare and perhaps combine them?". So all the cleverness of *what* to merge would only happen in userspace ? What would be the impact of a runtime read ? > Offline (a confusing name, it's a mounted filesystem but at a later time) It can even be done asynchronously, by registering an inotify on it, and then queuing the dedups in a userspace daemon. > See how much fun can we have with data structures? > And the best of all, the kernel needs just a single syscall, with all the complexity done in userspace. That's the whole beauty of it. You then create a whole ecosystem of softwares to address that complexity in every different manner possible :) Now, the mixed approach I promised earlier : As pure userspace take is not ideal, I was thinking about adding a FUSE in-place layer than would synchronously copy deduped-hardlinks on write. Could be triggered by a open(w) or a real write(). Else than that, just offer a raw, native, access to every other fileops. Steve.
_______________________________________________ Debconf-discuss mailing list [email protected] http://lists.debconf.org/mailman/listinfo/debconf-discuss
