I need at least a week, maybe two to promote anything to staging which is mainly because we do weekly releases. I could introduce a 2 day turn around but only with a more fixed type schedule. I am running 0.8.6 in production and REALLY want to upgrade for nothing more than getting compression ( the cost of petabytes of uncompressed data is just stupid ). So however I can help in changing my process OR better understanding the PMC here I am game for.
One thing I use C* for is holding days worth of data and re-running those days for regression on our software... simulating production... It might not take much to reverse it. /* Joe Stein http://www.medialets.com Twitter: @allthingshadoop */ On Nov 29, 2011, at 10:04 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > On Tue, Nov 29, 2011 at 6:16 PM, Jeremy Hanna > <jeremy.hanna1...@gmail.com>wrote: > >> I'd like to start a discussion about ideas to improve release quality for >> Cassandra. Specifically I wonder if the community can do more to help the >> project as a whole become more solid. Cassandra has an active and vibrant >> community using Cassandra for a variety of things. If we all pitch in a >> little bit, it seems like we can make a difference here. >> >> Release quality is difficult, especially for a distributed system like >> Cassandra. The core devs have done an amazing job with this considering >> how complicated it is. Currently, there are several things in place to >> make sure that a release is generally usable: >> - review-then-commit >> - 72 hour voting period >> - at least 3 binding +1 votes >> - unit tests >> - integration tests >> Then there is the personal responsibility aspect - testing a release in a >> staging environment before pushing it to production. >> >> I wonder if more could be done here to give more confidence in releases. >> I wanted to see if there might be ways that the community could help out >> without being too burdensome on either the core devs or the community. >> >> Some ideas: >> More automation: run YCSB and stress with various setups. Maybe people >> can rotate donating cloud instances (or simply money for them) but have a >> common set of scripts to do this in the source. >> >> Dedicated distributed test suite: I know there has been work done on >> various distributed test suites (which is great!) but none have really >> caught on so far. >> >> I know what the apache guidelines say, but what if the community could >> help out with the testing effort in a more formal way. For example, for >> each release to be finalized, what if there needed to be 3 community >> members that needed to try it out in their own environment? >> >> What if there was a post release +1 vote for the community to sign off on >> - sort of a "works for me" kind of thing to reassure others that it's safe >> to try. So when the release email gets posted to the user list, start a >> tradition of people saying +1 in reply if they've tested it out and it >> works for them. That's happening informally now when there are problems, >> but it might be nice to see a vote of confidence. Just another idea. >> >> Any other ideas or variations? > > > I am no software engineering guru, but whenever I +1 a hive release I > actually do checkout the code and run a couple queries. Mostly I find that > because there is just so many things not unit testable like those gosh darn > bash scripts that launch Java applications. There have been times when even > after multiple patch revisions and passing unit tests something just does > not work in the real world. So I never +1 a binary release I don't spend an > hour with and if possible I try twisting the knobs on any new feature or at > least just trying the basics.Hive is aiming for something like quarterly > releases. > > So possibly better to have Cassandra do time based releases. It does not > have to be quarterly but if people want bleeding edge features (something > committed 2 days ago) really they should go out and build something from > trunk. > > It seems like Cassandra devs have the voting and releasing down to a > science but from my world the types of bugs I worry about are data file > corruption, and any weird bug that would result in data faults like > read_repair not working or writes not going to the write nodes, or bloom > filters giving a faulty result. New features are great and I love seeing > them but I can wait for those. > > Updates now even trivial ones get political, you just never want to be the > guy that champions a update and then not have it go well :) > > Most users of Cassandra are going to have large clusters and really the > project should not outstrip the common users ability to stay up to date. > You have to figure that a large cluster like 20 nodes with maybe 200Gb > data/node, doing a rolling restart without degrading performance is going > to take some time. This is more then 'yum update cassandra' > /etc/init.d/cassandra restart' and with risk of something going wrong > people need time to QA and time for ops. This type of person does not like > to fall many releases behind and likewise can not be updating too often > either. > > I have never had to roll back a release but I do wait usually for a month > before running one to make sure there is not following soon.