Todd thanks for your feedback on these changes and the workarounds you are proposing.
I'm comfortable releasing as-is in that case, so I'd consider myself a +1. I know my vote came after the stated deadline but my goal was simply to provide additional testing and feedback on the release candidate. I didn't have time to test it prior to when I sent the email. > 1) The is_local() flag default is actually ignored by 0.9.1 and it has a > > CHECK to ensure that the field is set in the RaftConfigPB. That means > > that 74210b2546df9fd5dec7bb926eeb524362d2da90 was not a sufficient fix > for > > backcompat. Fix: https://gerrit.cloudera.org/4059 to fix it "again". > > If I understand correctly, this would prevent downgrade in two cases: > 1) if you've created a table with 0.10.0, the table wouldn't load properly > in 0.9.1 or earlier > 2) if you've formatted your master with 0.10.0 with a multi-master > configuration, then you couldn't start your multi-master configuration in > 0.9.1alrightd > Looks right to me. #2 above doesn't concern me since multi-master was experimental Agreed #1 is more problematic. However, if I understand correctly, you can > successfully complete an 0.9.1 -> 0.10.0 -> 0.9.1 upgrade/downgrade cycle > and those tables you created with the original 0.9.1 software would be > fine. > Yes, I think you are right that upgrade -> downgrade with no DDLs in between should still work. I suppose this isn't the worst thing. I had originally suspected it might also affect this case after config change but I'm convinced now that it doesn't. Personally, I'm OK with that limitation. Had we known about the issue prior > to voting, I would have said we should fix it, but I don't think it's worth > blocking the release for it. Fair enough. OK I agree that (1) is not a blocker. 2) Adding a field to TSRegistrationPB in KUDU-1490 triggered some error > > validation in TSDescriptor::Register() that the PB will not change > between > > registration invocations. I tested reverting KUDU-1490 and this appeared > to > > solve the problem. The revert is here: https://gerrit.cloudera.org/4060 > > This registration error only affects a tablet server re-registering to the > same master process. If you shut down the whole cluster, and restart the > whole cluster with the different version, it shouldn't be affected, best I > can tell. So, this would affect rolling upgrade/downgrade but not actually > prevent downgrade. > I was testing manually so it's certainly possible that I accidentally rolled one of the process downgrades. Looking again, I think you are right, so (2) and (3) don't seem like very serious problems after all. > 4) Remove CHECK preventing forward-compatibility with tablet history GC: > > https://gerrit.cloudera.org/4061 > > This could also be done by asking users who are concerned about the ability > to downgrade to start 1.0 with UNDO GC disabled. Only once they are sure > they don't want to downgrade (eg after a week or two of stability) they > could enable the new feature. > Good point. That's not a terrible workaround. OK, you've convinced me that the problem in (4) isn't so bad either. :) If people agree that my above analysis of the effects is correct, then my > opinion is we > should: > > 1) update the release notes for 0.10.0 to clearly state the > upgrade/downgrade restrictions as: > - rolling upgrade of the servers may not be performed between 0.9.x and > 0.10 > - tables created in 0.10 will not be accessible after a downgrade to 0.9.x > - a multi-master setup formatted in 0.10 may not be downgraded to 0.9.x > +1, sgtm Mike
