Well said. This should be on the Wiki :-)
On Sat, Aug 29, 2009 at 2:15 PM, John K. Dawson<[email protected]> wrote: > Lee, > > Thanks for posting this. I found the background and perspective very > interesting. > > John > > John K. Dawson > [email protected] > 612-860-2388 > > On Aug 29, 2009, at 12:56 PM, Lee Ward wrote: > >> You seem to be correct. Nobody ever seems to contrast NFS with these >> super file systems solutions. That is interesting. >> >> It's Saturday, the family is out running around. I have time to think >> about this question. Unfortunately, for you, I do this more for myself. >> Which means this is going to be a stream-of-consciousness thing far more >> than a well organized discussion. Sorry. >> >> I'd begin by motivating both NFS and Lustre. Why do they exist? What >> problems do they solve. >> >> NFS first. >> >> Way back in the day, ethernet and the concept of a workstation got >> popular. There were many tools to copy files between machines but few >> ways to share a name space; Have the directory hierarchy and it's >> content directly accessible to an application on a foreign machine. This >> made file sharing awkward. The model was to copy the file or files to >> the workstation where the work was going to be done, do the work, and >> copy the results back to some, hopefully, well maintained central >> machine. >> >> There *were* solutions to this at the time. I recall an attractive >> alternative called RFS (I believe) from the Bell Labs folks, via some >> place in England if I'm remembering right, it's been a looong time after >> all. It had issues though. The nastiest issue for me was that if a >> client went down the service side would freeze, at least partially. >> Since this could happen willy-nilly, depending on the users wishes and >> how well the power button on his workstation was protected, together >> with the power cord and ethernet connection, this freezing of service >> for any amount of time was difficult to accept. This was so even in a >> rather small collection of machines. >> >> The problem with RFS (?) and it's cousins were that they were all >> stateful. The service side depended on state that was held at the >> client. If the client went down, the service side couldn't continue >> without a whole lot of recovery, timeouts, etc. It was a very *annoying* >> problem. >> >> In the latter half of the 1980's (am I remembering right?) SUN proposed >> an open protocol called NFS. An implementation using this protocol could >> do most everything RFS(?) could but it didn't suffer the service-side >> hangs. It couldn't. It was stateless. If the client went down, the >> server just didn't care. If the server went down, the client had the >> opportunity to either give up on the local operation, usually with an >> error returned, or wait. It was always up to the user and for client >> failures the annoyance was limited to the user(s) on that client. >> >> SUN, also, wisely desired the protocol to be ubiquitous. They published >> it. They wanted *everyone* to adopt it. More, they would help >> competitors. SUN held interoperability bake-a-thons to help with this. >> >> It looks like they succeeded, all around :) >> >> Let's sum up, then. The goals for NFS were: >> >> 1) Share a local file system name space across the network. >> 2) Do it in a robust, resilient way. Pesky FS issues because some user >> kicked the cord out of his workstation was unacceptable. >> 3) Make it ubiquitous. SUN was a workstation vendor. They sold servers >> but almost everyone had a VAX in their back pocket where they made the >> infrastructure investment. SUN needed the high-value machines to support >> this protocol. >> >> Now Lustre. >> >> Lustre has a weird story and I'm not going to go into all of it. The >> shortest, relevant, part is that while there was at least one solution >> that DOE/NNSA felt acceptable, GPFS, it was not available on anything >> other than an IBM platform and because DOE/NNSA had a semi-formal policy >> of buying from different vendors at each of the three labs we were kind >> of stuck. Other file systems, existing and imminent, at the time were >> examined but they were all distributed file systems and we needed IO >> *bandwidth*. We needed lots, and lots of bandwidth. >> >> We also needed that ubiquitous thing that SUN had as one of their goals. >> We didn't want to pay millions of dollars for another GPFS. We felt that >> would only be painting ourselves into a corner. Whatever we did, the >> result *had* to be open. It also had to be attractive to smaller sites >> as we wanted to turn loose of the ting at some point. If it was >> attractive for smaller machines we felt we would win in the long term >> as, eventually, the cost to further and maintain this thing was spread >> across the community. >> >> As far as technical goals, I guess we just wanted GPFS, but open. More >> though, we wanted it to survive in our platform roadmaps for at least a >> decade. The actual technical requirements for the contract that DOE/NNSA >> executed with HP, CFS was the sub-contractor responsible for >> development, can be found here: >> >> <http://www-cs-students.stanford.edu/~trj/SGS_PathForward_SOW.pdf> >> >> LLNL used to host this but it's no longer there? Oh well, hopefully this >> link will be good for a while, at least. >> >> I'm just going to jump to the end and sum the goals up: >> >> 1) It must do *everything* NFS can. We relaxed the stateless thing >> though, see the next item for why. >> 2) It must support full POSIX semantics; Last writer wins, POSIX locks, >> etc. >> 3) It must support all of the transports we are interested in. >> 4) It must be scalable, in that we can cheaply attach storage and both >> performance (reading *and* writing) and capacity within a single mounted >> file system increase in direct proportion. >> 6) We wanted it to be easy, administratively. Our goal was that it be no >> harder than NFS to set up and maintain. We were involving too many folks >> with PhDs in the operation of our machines at the time. Before you yell >> FAIL, I'll say we did try. I'll also say we didn't make CFS responsible >> for this part of the task. Don't blame them overly much, OK? >> 7) We recognized we were asking for a stateful system, we wanted to >> mitigate that by having some focus on resiliency. These were big >> machines and clients died all the time. >> 8) While not in the SOW, we structured the contract to accomplish some >> future form of wide acceptance. We wanted it to be ubiquitous. >> >> That's a lot of goals! For the technical ones, the main ones are all >> pretty much structured to ask two things of what became Lustre. First, >> give us everything NFS functionally does but go far beyond it in >> performance. Second, give us everything NFS functionally does but make >> it completely equivalent to a local file system, semantically. >> >> There's a little more we have to consider. NFS4 is a different beast >> than NFS2 or NFS3. NFS{2,3} had some serious issues that becaome more >> prominent as time went by. First, security; It had none. Folks had >> bandaged on some different things to try to cure this but they weren't >> standard across platforms. Second, it couldn't do the full POSIX >> required semantics. That was attacked with the NFS lock protocols but it >> was such an after-thought it will always remain problematic. Third, new >> authorization possibilities introduced by Microsoft and then POSIX, >> called ACLs, had no way of being accomplished. >> >> NFS4 addresses those by: >> >> 1) Introducing state. Can do full POSIX now without the lock servers. >> Lots of resiliency mechanisms introduced to offset the downside of this, >> too. >> 2) Formalizing and offerring standardized authentication headers. >> 3) Introducing ACLs that map to equivalents in POSIX and Microsoft. >> >> Strengths and Weaknesses of the Two >> ----------------------------------- >> >> NFS4 does most everything Lustre can with one very important exception, >> IO bandwidth. >> >> Both seem able to deliver metadata performance at roughly the same >> speeds. File create, delete, and stat rates are about the same. NetApp >> seems to have a partial enhancement. They bought the Spinnaker goodies >> some time back and have deployed that technology, and redirection >> too(?), within their servers. The good about that is two users in >> different directories *could* leverage two servers, independently, and, >> so, scale metadata performance. It's not guaranteed but at least there >> is the possibility. If the two users are in the same directory, it's not >> much different, though, I'm thinking. Someone correct me if I'm wrong? >> >> Both can offer full POSIX now. It's nasty in both cases but, yes, in >> theory you can export mail directory hierarchies with locking. >> >> The NFS client and server are far easier to set up and maintain. The >> tools to debug issues are advanced. While the Lustre folks have done >> much to improve this area, NFS is just leaps and bounds ahead. It's >> easier to deal with NFS than Lustre. Just far, far easier, still. >> >> NFS is just built in to everything. My TV has it, for hecks sake. Lustre >> is, seemingly, always an add-on. It's also a moving target. We're >> constantly futzing with it, upgrading, and patching. Lustre might be >> compilable most everywhere we care about but building it isn't trivial. >> The supplied modules are great but, still, moving targets in that we >> wait for SUN to catch up to the vendor supplied changes that affect >> Lustre. Given Lustre's size and interaction with other components in the >> OS, that happens far more frequently than desired. NFS just plain wins >> the ubiquity argument at present. >> >> NFS IO performance does *not* scale. It's still an in-band protocol. The >> data is carried in the same message as the request and is, practically, >> limited in size. Reads are more scalable in writes, a popular >> file-segment can be satisfied from the cache on reads but develops >> issues at some point. For writes, NFS3 and NFS4 help in that they >> directly support write-behind so that a client doesn't have to wait for >> data to go to disk, but it's just not enough. If one streams data >> to/from the store, it can be larger than the cache. A client that might >> read a file already made "hot" but at a very different rate just loses. >> A client, writing, is always looking for free memory to buffer content. >> Again, too many of these, simultaneously, and performance descends to >> the native speed of the attached back-end store and that store can only >> get so big. >> >> Lustre IO performance *does* scale. It uses a 3rd-party transfer. >> Requests are made to the metadata server and IO moves directly between >> the affected storage component(s) and the client. The more storage >> components, the less possibility of contention between clients and the >> more data can be accepted/supplied per unit time. >> >> NFS4 has a proposed extension, called pNFS, to address this problem. It >> just introduces the 3rd-party data transfers that Lustre enjoys. If and >> when that is a standard, and is well supported by clients and vendors, >> the really big technical difference will virtually disappear. It's been >> a long time coming, though. It's still not there. Will it ever be, >> really? >> >> The answer to the NFS vs. Lustre question comes down to the workload for >> a given application then, since they do have overlap in their solution >> space. If I were asked to look at a platform and recommend a solution I >> would worry about IO bandwidth requirements. If the platform in question >> were either read-mostly and, practically, never needed sustained read or >> write bandwidth, NFS would be an easy choice. I'd even think hard about >> NFS if the platform created many files but all were very small; Today's >> filers have very respectable IOPS rates. If it came down to IO >> bandwidth, I'm still on the parallel file system bandwagon. NFS just >> can't deal with that at present and I do still have the folks, in house, >> to manage the administrative burden. >> >> Done. That was useful for me. I think five years ago I might have opted >> for Lustre in the "create many small files" case, where I would consider >> NFS today, so re-examining the motivations, relative strengths, and >> weaknesses of both was useful. As I said, I did this more as a >> self-exercise than anything else but I hope you can find something >> useful here, too. The family is back from their errands, too :) Best >> wishes and good luck. >> >> --Lee >> >> >> On Wed, 2009-08-26 at 04:11 -0600, Tharindu Rukshan Bamunuarachchi >> wrote: >>> >>> hi All, >>> >>> >>> >>> I need to prepare small report on “NFS vs. Lustre” ? >>> >>> >>> >>> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) … >>> >>> >>> >>> Can you guys please provide few tips … URLs … etc. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> cheers, >>> >>> __ >>> >>> tharindu >>> >>> >>> >>> >>> >>> ******************************************************************************************************************************************************************* >>> >>> "The information contained in this email including in any attachment >>> is confidential and is meant to be read only by the person to whom it >>> is addressed. If you are not the intended recipient(s), you are >>> prohibited from printing, forwarding, saving or copying this email. If >>> you have received this e-mail in error, please immediately notify the >>> sender and delete this e-mail and its attachments from your computer." >>> >>> >>> ******************************************************************************************************************************************************************* >>> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
