> The only synchronization point needed is to make sure that all bricks > agree on the inode state and which client owns it. This can be achieved > without locking using a method similar to what I implemented in the DFC > translator. > > Besides the lock-less architecture, the main advantage is that much more > aggressive caching strategies can be implemented very near to the final > user, increasing considerably the throughput of the file system. Special > care has to be taken with things than can fail on background writes > (basically brick space and user access rights). Those should be handled > appropiately on the client side to guarantee future success of writes. > > Of course this is only a high level overview. A deeper analysis should > be done to see what to do on each special case. > > What do you think ?
I think this is a great idea for where we can go - and need to go - in the long term. However, it's important to recognize that it *is* the long term. We had to solve almost exactly the same problems in MPFS long ago. Whether the synchronization uses locks or not *locally* is meaningless, because all of the difficult problems have to do with recovering the *distributed* state. What happens when a brick fails while holding an inode in any state but I? How do we recognize it, what do we do about it, how do we handle the case where it comes back and needs to re-acquire its previous state? How do we make sure that a brick can successfully flush everything it needs to before it yields a lock/lease/whatever? That's going to require some kind of flow control, which is itself a pretty big project. It's not impossible, but it took multiple people some years for MPFS, and ditto for every other project (e.g. Ceph or XtreemFS) which adopted similar approaches. GlusterFS's historical avoidance of this complexity certainly has some drawbacks, but it has also been key to us making far more progress in other areas. To move forward on this, I think we need a *much* more detailed idea of how we're going to handle the nasty cases. Would some sort of online collaboration - e.g. Hangouts - make more sense than continuing via email? _______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel