Hi John, I has assumed given your requirements that you would spring for the service contract.
I think parallel file systems are inherently complicated and Lustre is competitive in terms of maturity, etc. with other similar products. Jim On Wed, May 14, 2008 at 05:21:28PM -0400, jrs wrote: > Thanks for the insight, Jim (and Mike and Aaron), > > Unfortunately, I've now gotten contradictory views (not terribly > surprising: people have different views and experiences, etc...). > > Mike (who posted earlier) implied that, if the underlying storage > and network were solid and if failover is done right that it > can be trusted. > > Jim, would having a support contract change your view? Or, might > the progression toward finding that right version/right hardware > be dangerous even with support? Is this something related to > the codes immaturity? Or just a complex problem? > > thanks much, > John > > > Jim Garlick wrote: > >John, > > > >Lustre can be damn robust if you get the right version on the right > >hardware. Also, I think the new engineering practices and future > >architecture that uses ZFS on the back end will only improve this. > > > >That said, your predicament is troubling. As a general rule I would not > >trust any parallel file system that I know of with mission critical data. > >Failures do happen; indeed we have lost data in Lustre on several > >occasions. > > > >In some sense we're in a similar position. The data we put in Lustre > >is important to our mission (well some of it anyway), costly to > >regenerate, and impractical to back up with a general backup policy. > > > >What we do is basically advertise Lustre as temporary scratch space and > >provide an HPSS tape archive for users to copy their most critical data to. > >That may not work in your case, but if I were you I would at least have > >some sort of disaster plan for recovering or regenerating your data. > >In short, don't trust Lustre or any parallel file system as the sole > >repository for your mission critical data. > > > >Jim > > > >On Wed, May 14, 2008 at 02:21:02PM -0400, jrs wrote: > >>Greetings all, > >> > >>I just spoke with someone at a large computing company who > >>has a close relationship with lustre/sun (a reseller, I guess). > >>This person described lustre as being something that Sun > >>"would not recommend for mission critical use." > >> > >>Can this be true? > >> > >>I work for a small/medium company that does image processing. > >>We have about 700TB of data presently and might be at 2PB within > >>the next couple of years. Owing to the amount of data we don't > >>make backups for most of it and trust raid 6 on our hardware raid > >>boxes (nexsan Satabeast) to fail more slowly than we can replace > >>disks. Over the last couple of years we've had great luck and, > >>I believe, have never lost data owing to a failure with this > >>hardware (software or human error is another matter ;-). > >>However, the unbacked up data is "mission critical." Though > >>it can, probably, all be reconstructed or reacquired, as a practical > >>matter losing a significant quantity of this data could be > >>catastrophic for our business. > >> > >>So, what do you think, can lustre be trusted to keep our > >>data safe at our company? Assume in answering that we have > >>failover working properly. We can also withstand some blocking > >>of the filesystem while a failover event completes, i.e., not > >>having the filesystem available for some amount of time is > >>not a problem, but having directory important-data/ disappear > >>is a HUGE problem. > >> > >>Thanks for any help or guidance, > >> > >>John > >>_______________________________________________ > >>Lustre-discuss mailing list > >>[email protected] > >>http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
