On Apr 20, 2007 19:00 -0400, Shobhit Dayal wrote: > We're a group of students at CMU and we're building a project around > lustre. A main part of the work involves introducing multiple mds servers in > lustre.
I'm sad to inform you that the work for introducing multiple MDTs for a single filesystem has been going on for several years already, and is mostly done (target for release some time at the end of this year). This is what we call "clustered metadata" (CMD). I'm not sure what our policy is for releasing an alpha version of this code would be. > Now we have a design for managing metadata from multiple mds's, but we were > wondering how much work it is, besides changing mds metadata management, > to introduce a new active mds server. Our impression so far is that neither > the client nor the ost's will work easily with a new active mds entity in > the cluster in terms of managing connections from multiple mds's and that > they will have to be changed. Is this correct ? For CMD, there is a new "logical metadata volume" (LMV) that handles the connections from the filesystem to the multiple MDTs. This is somewhat analogous to the LOV, in that it spreads MDT access and operations over the multiple MDTs. Each MDT is still mostly independent in that they export a single ext3 filesystem (like multiple OSTs on a single OSS), rather than any shared-access to the same block device. > For instance, for experiment purpose: we created a client-->mds-->ost and > created some file through them 'foo', 'bar'. Then replicated the file system > on the mds that stores all the metadata onto another mds mds2. > Now we introduced a second client and tried to setup the connections > client2-->mds2-->ost Ah, this is somewhat different than CMD where each MDT is a (mostly) independent subset of the filesystem. The CMD code has no replication between MDTs. That would definitely be an interesting and worthwhile project. It would be implemented in a very similar manner, with a replicating layer between llite and the MDC, each MDC connecting to a separate MDT. > This setup does not work when foo, bar are written from both clients. > changes cannot be seen from both clients. As soon as the second mds > connects, the client1, mds1 seem to loose their connection with the ost. > > Can someone point us to the right way to bring up two mds's in the lustre > environment, even though it may lead to data/metadata corruption ? You need a layer like LOV is for OSCs to handle multiple independent connections. Then, that layer should handle replicating the requests to each of the MDTs for modifying events (in MDT order), and could e.g. round-robin for read-only events (e.g. getattr) to help spread the load. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. _______________________________________________ Lustre-devel mailing list Lustre-devel@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-devel