Hi Colin, On Fri, May 2, 2025 at 6:52 PM Colin McCabe <cmcc...@apache.org> wrote: > > On Fri, May 2, 2025, at 09:54, José Armando García Sancio wrote: > That seems pretty clear? It's also already implemented (although not used > since we don't have any inter-dependent features yet).
I see. This is mentioned in the "Compatibility, Deprecation, and Migration Plan" section and not in the "Proposed Changes" section. This is probably why I couldn't find any definition of feature dependencies. > I agree that running the 3.6.x software version on some nodes, and 4.0.x one > others would be very odd. 3.6 isn't even one of our supported software > versions any more. Right now we're supporting 4.0, 3.9, 3.8. I was just > trying to give an example. Perhaps we can simply agree to move on, since the > bad case here relies on using unsupported software versions to do an > unsupported operation. I don't understand the argument. Software version 4.0 is supported by the Apache Kafka community. The issue is not the software version but the metadata version. Apache Kafka supports MV greater than or equal to 3.3. Controller registration was added in MV 3.7. Just to reiterate, the active controller can be in software version X (which supports downgrade) but the rest of the controllers can be in software version 4.0 which is supported but doesn't downgrade. If the MV version is less than 3.7 which is supported by Apache Kafka (AK supports MV greater than or equal to 3.3), the active controller will accept the downgrade but inactive controllers don't support downgrade. If the user gets into this case, what is the workaround? Do they need to upgrade all of their controllers to a version that supports downgrade? Is that sufficient? > >> Any problems that result from this should be clearable by restarting the > >> ancient controller processes, though (since > > > they will then load the snapshot at the older MV) > > > > Are you assuming that the user is going to upgrade the software > > version when they restart the "ancient" controller processes? > > > > No, I was not assuming that the software would be upgraded. In KRaft, I had a similar issue with kraft.version and the upgrade from kraft.version 0 to kraft.version 1. I solved this problem by having voters send their supported version (UpdateVoter RPC) to the leader based on the supported version in the leader's ApiVersions response and not based on the value of the MV or kraft.version. E.g. if the ApiVersions of the leader has the UpdateVoter RPC enumerated the other voters send UpdateVoter RPC irrespective of MV or kraft.version. The leader has special handling where it only stores updated voter/registration in memory because it cannot persist them since the entire cluster may not support those registration. But the important part is that the leader (active controller) supports RPCs and can check them. > > Snapshot at end offset 100. All records between 0 and 99, inclusive, > > have been included in the snapshot. The MV in the snapshot is X > > offset 100 -> metadata.version = Y -- MV was previously upgraded to version > > Y > > ... -> ... -- All of these records are serialized using > > version Y. > > offset 110 -> metadata.version = X -- MV was downgraded to X. > > > > Before a snapshot that includes offset 110 (MV 3.9) could be generated > > the node restarts. How would the code identify that the records > > between 100 and 110 need to be snapshotted using metadata version 3.9? > > Note that the metadata loader can batch all of the records between 100 > > and 110 into one delta. > > You loaded a snapshot with metadata version Y. That's what I am trying to highlight. There is no snapshot for MV Y at offset 100. There is only a snapshot for MV X at offset 99. > You replayed a record changing the metadata version to X. We already > specified that this will cause us to generate a new snapshot. The snapshot > will presumably be generated with offset 110 since that's the offset that > changed the MV. I suppose it could also have a slightly larger offset if the > loader batched more. Yes but the important part in my example above is that the snapshot at offset 100 has a metadata version of X and the metadata delta at offset 110 after that snapshot has the MV at X. Yet the MV version changed to Y (e.g. offset 105) and back to X (at offset 110) in between 100 and 110. Thanks, -- -José