Hi Colin,

On Fri, May 2, 2025 at 6:52 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> On Fri, May 2, 2025, at 09:54, José Armando García Sancio wrote:
> That seems pretty clear? It's also already implemented (although not used 
> since we don't have any inter-dependent features yet).

I see. This is mentioned in the "Compatibility, Deprecation, and
Migration Plan" section and not in the "Proposed Changes" section.
This is probably why I couldn't find any definition of feature
dependencies.

> I agree that running the 3.6.x software version on some nodes, and 4.0.x one 
> others would be very odd. 3.6 isn't even one of our supported software 
> versions any more. Right now we're supporting 4.0, 3.9, 3.8. I was just 
> trying to give an example. Perhaps we can simply agree to move on, since the 
> bad case here relies on using unsupported software versions to do an 
> unsupported operation.

I don't understand the argument. Software version 4.0 is supported by
the Apache Kafka community. The issue is not the software version but
the metadata version. Apache Kafka supports MV greater than or equal
to 3.3. Controller registration was added in MV 3.7. Just to
reiterate, the active controller can be in software version X (which
supports downgrade) but the rest of the controllers can be in software
version 4.0 which is supported but doesn't downgrade. If the MV
version is less than 3.7 which is supported by Apache Kafka (AK
supports MV greater than or equal to 3.3), the active controller will
accept the downgrade but inactive controllers don't support downgrade.

If the user gets into this case, what is the workaround? Do they need
to upgrade all of their controllers to a version that supports
downgrade? Is that sufficient?

> >> Any problems that result from this should be clearable by restarting the 
> >> ancient controller processes, though (since
> > > they will then load the snapshot at the older MV)
> >
> > Are you assuming that the user is going to upgrade the software
> > version when they restart the "ancient" controller processes?
> >
>
> No, I was not assuming that the software would be upgraded.

In KRaft, I had a similar issue with kraft.version and the upgrade
from kraft.version 0 to kraft.version 1. I solved this problem by
having voters send their supported version (UpdateVoter RPC) to the
leader based on the supported version in the leader's ApiVersions
response and not based on the value of the MV or kraft.version. E.g.
if the ApiVersions of the leader has the UpdateVoter RPC enumerated
the other voters send UpdateVoter RPC irrespective of MV or
kraft.version. The leader has special handling where it only stores
updated voter/registration in memory because it cannot persist them
since the entire cluster may not support those registration. But the
important part is that the leader (active controller) supports RPCs
and can check them.

> > Snapshot at end offset 100. All records between 0 and 99, inclusive,
> > have been included in the snapshot. The MV in the snapshot is X
> > offset 100 -> metadata.version = Y -- MV was previously upgraded to version 
> > Y
> > ...             ->   ...  -- All of these records are serialized using
> > version Y.
> > offset 110 -> metadata.version = X -- MV was downgraded to X.
> >
> > Before a snapshot that includes offset 110 (MV 3.9) could be generated
> > the node restarts. How would the code identify that the records
> > between 100 and 110 need to be snapshotted using metadata version 3.9?
> > Note that the metadata loader can batch all of the records between 100
> > and 110 into one delta.
>
> You loaded a snapshot with metadata version Y.

That's what I am trying to highlight. There is no snapshot for MV Y at
offset 100. There is only a snapshot for MV X at offset 99.

> You replayed a record changing the metadata version to X. We already 
> specified that this will cause us to generate a new snapshot. The snapshot 
> will presumably be generated with offset 110 since that's the offset that 
> changed the MV. I suppose it could also have a slightly larger offset if the 
> loader batched more.

Yes but the important part in my example above is that the snapshot at
offset 100 has a metadata version of X and the metadata delta at
offset 110 after that snapshot has the MV at X. Yet the MV version
changed to Y (e.g. offset 105) and back to X (at offset 110) in
between 100 and 110.

Thanks,
-- 
-José

Reply via email to