>Traditionally there have been no guarantees of cross major version >compatibility. RPC especially. Never a rolling upgrade from an 0.90 to an >0.92, for example. For persistent data, there is a migration utility for >upping from one major release to the next.
I'm advocating that RPC compatibility breakage is not acceptable for FB because this is a vital and highly-deployed infrastructure piece. I'm assuming this strategy may not be acceptable for other major contributors as well. I can't imagine that CDH customers don't need cross-version compat, which will most likely go from 92->96+. I think we need to have a client->server online migration strategy for currently active revisions. This is independent of whether we label the current build 1.0 or not. In fact, I would advocate that we want to be in the habit of cross-version compatibility and long-term thinking before we actually release a 1.0. With 1.0, we don't just want to have cross-version compat, we want to have the problem nailed or else it will cause major support problems. Note that I'm not advocating cross-service RPC compat at this time. I don't think we need to tackle online rolling upgrades of HBase from 92->94 (e.g. Mixed RegionServer versions or mixed master-RS). Doing a start-stop of the entire HBase cluster is probably fine before 1.0. However, I think it's safe to say that there are multiple instances where the DBA team and the AppServer team are different people, especially with any group exploring multi-tenancy. For that use case, client & server compat is critical. >Regarding RPC, this state of affairs is not really acceptable to anyone >any more. Over in core there's work to move 99% of RPC to protobufs, with >only the thinnest Writable header. In this thread there seem several here >who want to tackle this for HBase now. Are the people working on this functionality are thinking about client-server compat? JIRA #s? >Regarding major version data migrations, the attitude I believe is pre >1.0 we can entertain design changes that break compatibility, in search >of something that works well enough to be 1.0. From that point forward, >compatibility is a requirement. What's the definition of working well enough to be 1.0? I thought having stable, durable PB+ online data storage would be considered working "well enough". Did we not declare that 0.90 was the "data durable" version of HBase that you could trust? Migrations should be a first-class priority after declaring the project "data durable". If you cannot reliably persist 100% of data after upgrade, then the version truly wasn't "data durable".
