Nikke, Are you customer of Lustre Support? I don't have you listed as a supported customer. Maybe we should arrange a discussion about how we could assist you more effectively.
Best regards, Kevin -- P. Kevin Canady Director, Business Development Lustre Group (Formerly CFS) Sun Microsystems, Inc. O: 415.928.3633 C: 415.505.7701 On 10/11/07 11:53 PM, "Niklas Edmundsson" <[EMAIL PROTECTED]> wrote: > > OK, I know that there is supposedly some QA before lustre releases and > that it might be the reason for fixes taking such a long time to > propagate, but still: It takes too long for fixes to end up in a > released version... > > During our rather limited testing on Ubuntu Dapper (using the Debian > 2.6.18 kernel on servers and pkg-lustre packaging) we've run into > a couple of bugs, most of them with the typical "fix in bugzilla". > > The pkg-lustre packaging has six fixes from bugzilla applied, they > seem to have munged the bug numbers but it seems that only three of > them are in the 1.6.3 changelog. > > We have locally applied fixes from bug 13438 (lustre is totally > useless without it due to servers OOPS:ing) and 13614. None of them > seems to be in the 1.6.3 changelog. > > So, I'd suggest that CFS gets their act together and starts releasing > versions more often, if they'd done this during 1.6 development we > wouldn't be installing production releases that you can crash after a > day of testing now. > > If QA is the argument for not doing releases more often, consider the > fact that known broken releases that you have to patch yourself with > patches hidden in bugzilla isn't much better. > > In reality, I think that doing non-QA'd snapshot releases might be the > way to go. That is, releases with the useful more-or-less trivial > fixes that avoids crashes etc. and that will be included in the next > QA'd release. They would not be suitable for production, but at least > you can rather easily download the latest snapshot and try on your > test cluster and see if it fixes the problem(s) you've encountered. > And if it does, we can bug CFS until they get their act together and > gets a release out with the fix included. > > In the end, you have to realise that when you have a production system > you don't want to wait for weeks and months for a new release that > might fix a crash-inducing bug you're hitting. I say might here, > because obviously having a fix hidden in bugzilla is no guarantee that > it's included in a released version. > > In our case we're not at production yet because of these problems with > getting fixes out quickly enough. So far we've always been able to > crash lustre 1.6 within days, and that's after waiting for 1.6 for > well over a year. > > So, I'd like to challenge CFS to get a version of lustre 1.6 (or 1.8, > whatever) out that proves stable on our small lustre test setup. > Without patches. In the year of 2007. > > Since the "internal QA only" approach obviously isn't working, I'd > suggest that you embrace "release early, release often" to get there. > That means one release per week as long as you have fixes pending to > get a decent churn on things. > > > /Nikke _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
