I left some comments on the reviews, but this is probably a better place to
discuss the impact before we decide whether to move forward with the 0.10.0
release. Technically the vote closed an hour before Mike's -0, but we could
still decide to not release 0.10.0-RC1 and build a new 0.10.0-RC2 with a
shortened vote period. Alternatively, we could decide to release 0.10.0 and
immediately start voting on a 0.10.1 with the fixes.

My opinion is we need to understand the impact of the issues a little
better before making a call. But, we should try to make a call quite soon
since I know there are users waiting on this release.

On Fri, Aug 19, 2016 at 2:21 AM, Mike Percy <[email protected]> wrote:

> -0
>
> I finally found time to manually test downgrade from 0.10.0-RC1 to 0.9.1
> and found problems with downgrade. I know it's very late in the release
> cycle, but I've been out of town. I found the following problems, and
> potential fixes for them:
>
> 1) The is_local() flag default is actually ignored by 0.9.1 and it has a
> CHECK to ensure that the field is set in the RaftConfigPB. That means
> that 74210b2546df9fd5dec7bb926eeb524362d2da90 was not a sufficient fix for
> backcompat. Fix: https://gerrit.cloudera.org/4059 to fix it "again".
>

If I understand correctly, this would prevent downgrade in two cases:
1) if you've created a table with 0.10.0, the table wouldn't load properly
in 0.9.1 or earlier
2) if you've formatted your master with 0.10.0 with a multi-master
configuration, then you couldn't start your multi-master configuration in
0.9.1

#2 above doesn't concern me since multi-master was experimental and has
lots of known issues in 0.9.1. So, preventing downgrade back to a version
where the feature was already not supported doesn't seem like a big issue.

#1 is more problematic. However, if I understand correctly, you can
successfully complete an 0.9.1 -> 0.10.0 -> 0.9.1 upgrade/downgrade cycle
and those tables you created with the original 0.9.1 software would be fine.

Personally, I'm OK with that limitation. Had we known about the issue prior
to voting, I would have said we should fix it, but I don't think it's worth
blocking the release for it. After all, we are pre-1.0 software and we have
never documented any strong guarantee about downgrade capability. After 1.0
I do think we should be stricter, but even then it will be likely that
there are cases where a user has created data in a new version that could
not be read by an older version (e.g. if using a new column encoding not
supported by the earlier one).

2) Adding a field to TSRegistrationPB in KUDU-1490 triggered some error
> validation in TSDescriptor::Register() that the PB will not change between
> registration invocations. I tested reverting KUDU-1490 and this appeared to
> solve the problem. The revert is here: https://gerrit.cloudera.org/4060


This registration error only affects a tablet server re-registering to the
same master process. If you shut down the whole cluster, and restart the
whole cluster with the different version, it shouldn't be affected, best I
can tell. So, this would affect rolling upgrade/downgrade but not actually
prevent downgrade.


> If we want to maintain downgradability for this release then we could apply
> these patches and do a quick re-vote without the waiting period, perhaps?
>
> If we want to attempt for downgrade compat from 1.0.0 to 0.10.0 as well
> then we would need to additionally apply the below patches, or something
> similar:
>
> 3) Reimplement the validation in TSDescriptor::Register() so that we can
> add fields to TSRegistrationPB in the next release without a backcompat
> problem: https://gerrit.cloudera.org/4062


Per above, I think this only affects rolling, not stop-start.


>
> 4) Remove CHECK preventing forward-compatibility with tablet history GC:
> https://gerrit.cloudera.org/4061


This could also be done by asking users who are concerned about the ability
to downgrade to start 1.0 with UNDO GC disabled. Only once they are sure
they don't want to downgrade (eg after a week or two of stability) they
could enable the new feature.


> I believe these are all low risk changes, but if this seems like too much
> change for so late in the game or not worth it then we can just relnote
> that downgrading from 0.10.0 is not possible, and we'll probably say the
> same thing for 1.0.0 as well.
>
>
I agree that they are low risk, but I also think that the reward is not
worth re-starting a new vote, re-testing artifacts, etc. If people agree
that my above analysis of the effects is correct, then my opinion is we
should:

1) update the release notes for 0.10.0 to clearly state the
upgrade/downgrade restrictions as:
- rolling upgrade of the servers may not be performed between 0.9.x and 0.10
- tables created in 0.10 will not be accessible after a downgrade to 0.9.x
- a multi-master setup formatted in 0.10 may not be downgraded to 0.9.x

-Todd

Reply via email to