I left some comments on the reviews, but this is probably a better place to discuss the impact before we decide whether to move forward with the 0.10.0 release. Technically the vote closed an hour before Mike's -0, but we could still decide to not release 0.10.0-RC1 and build a new 0.10.0-RC2 with a shortened vote period. Alternatively, we could decide to release 0.10.0 and immediately start voting on a 0.10.1 with the fixes.
My opinion is we need to understand the impact of the issues a little better before making a call. But, we should try to make a call quite soon since I know there are users waiting on this release. On Fri, Aug 19, 2016 at 2:21 AM, Mike Percy <[email protected]> wrote: > -0 > > I finally found time to manually test downgrade from 0.10.0-RC1 to 0.9.1 > and found problems with downgrade. I know it's very late in the release > cycle, but I've been out of town. I found the following problems, and > potential fixes for them: > > 1) The is_local() flag default is actually ignored by 0.9.1 and it has a > CHECK to ensure that the field is set in the RaftConfigPB. That means > that 74210b2546df9fd5dec7bb926eeb524362d2da90 was not a sufficient fix for > backcompat. Fix: https://gerrit.cloudera.org/4059 to fix it "again". > If I understand correctly, this would prevent downgrade in two cases: 1) if you've created a table with 0.10.0, the table wouldn't load properly in 0.9.1 or earlier 2) if you've formatted your master with 0.10.0 with a multi-master configuration, then you couldn't start your multi-master configuration in 0.9.1 #2 above doesn't concern me since multi-master was experimental and has lots of known issues in 0.9.1. So, preventing downgrade back to a version where the feature was already not supported doesn't seem like a big issue. #1 is more problematic. However, if I understand correctly, you can successfully complete an 0.9.1 -> 0.10.0 -> 0.9.1 upgrade/downgrade cycle and those tables you created with the original 0.9.1 software would be fine. Personally, I'm OK with that limitation. Had we known about the issue prior to voting, I would have said we should fix it, but I don't think it's worth blocking the release for it. After all, we are pre-1.0 software and we have never documented any strong guarantee about downgrade capability. After 1.0 I do think we should be stricter, but even then it will be likely that there are cases where a user has created data in a new version that could not be read by an older version (e.g. if using a new column encoding not supported by the earlier one). 2) Adding a field to TSRegistrationPB in KUDU-1490 triggered some error > validation in TSDescriptor::Register() that the PB will not change between > registration invocations. I tested reverting KUDU-1490 and this appeared to > solve the problem. The revert is here: https://gerrit.cloudera.org/4060 This registration error only affects a tablet server re-registering to the same master process. If you shut down the whole cluster, and restart the whole cluster with the different version, it shouldn't be affected, best I can tell. So, this would affect rolling upgrade/downgrade but not actually prevent downgrade. > If we want to maintain downgradability for this release then we could apply > these patches and do a quick re-vote without the waiting period, perhaps? > > If we want to attempt for downgrade compat from 1.0.0 to 0.10.0 as well > then we would need to additionally apply the below patches, or something > similar: > > 3) Reimplement the validation in TSDescriptor::Register() so that we can > add fields to TSRegistrationPB in the next release without a backcompat > problem: https://gerrit.cloudera.org/4062 Per above, I think this only affects rolling, not stop-start. > > 4) Remove CHECK preventing forward-compatibility with tablet history GC: > https://gerrit.cloudera.org/4061 This could also be done by asking users who are concerned about the ability to downgrade to start 1.0 with UNDO GC disabled. Only once they are sure they don't want to downgrade (eg after a week or two of stability) they could enable the new feature. > I believe these are all low risk changes, but if this seems like too much > change for so late in the game or not worth it then we can just relnote > that downgrading from 0.10.0 is not possible, and we'll probably say the > same thing for 1.0.0 as well. > > I agree that they are low risk, but I also think that the reward is not worth re-starting a new vote, re-testing artifacts, etc. If people agree that my above analysis of the effects is correct, then my opinion is we should: 1) update the release notes for 0.10.0 to clearly state the upgrade/downgrade restrictions as: - rolling upgrade of the servers may not be performed between 0.9.x and 0.10 - tables created in 0.10 will not be accessible after a downgrade to 0.9.x - a multi-master setup formatted in 0.10 may not be downgraded to 0.9.x -Todd
