Hi Stan and Gaurav, Just to clarify some points mentioned here before KAFKA-14616: I raised a year ago so it's not related to JBOD work. It is rather a blocker bug for KRAFT in general. The PR from Colin should fix this. Am not sure if it is a blocker for 3.7 per-say as it was a major bug since 3.3 and got missed from all other releases. Regarding the JBOD's work: KAFKA-16082: Is not a blocker for 3.7 instead it's nice fix. The pr https://github.com/apache/kafka/pull/15136 is quite a small one and was approved by Proven and I but it is waiting for a committer's approval. KAFKA-16162: This is a blocker for 3.7. Same it’s a small pr https://github.com/apache/kafka/pull/15270 and it is approved Proven and I and the PR is waiting for committer's approval. KAFKA-16157: This is a blocker for 3.7. There is one small suggestion for the pr https://github.com/apache/kafka/pull/15263 but I don't think any of the current feedback is blocking the pr from getting approved. Assuming we get a committer's approval on it. KAFKA-16195: Same it's a blocker but it has approval from Proven and I and we are waiting for committer's approval on the pr https://github.com/apache/kafka/pull/15262.
If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and KAFKA-16195 in time for 3.7 then we can mark JBOD as early release assuming we merge at least KAFKA-16195. Regards, Omnia > On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote: > > Apologies, I duplicated KAFKA-16157 twice in my previous message. I intended > to mention KAFKA-16195 > with the PR at https://github.com/apache/kafka/pull/15262 as the second JIRA. > > Thanks, > Gaurav > >> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote: >> >> Hi Stan, >> >> I wanted to share some updates about the bugs you shared earlier. >> >> - KAFKA-14616: I've reviewed and tested the PR from Colin and have observed >> the fix works as intended. >> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the proposed >> fix. I've >> therefore raised https://github.com/apache/kafka/pull/15270 following a >> discussion with Luke in JIRA. >> - KAFKA-16082: I don't think this is marked as a blocker anymore. I'm >> awaiting >> feedback/reviews at https://github.com/apache/kafka/pull/15136 >> >> In addition to the above, there are 2 JIRAs I'd like to bring everyone's >> attention to: >> >> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a blocker. >> I've raised >> https://github.com/apache/kafka/pull/15263 and am awaiting reviews on it. >> - KAFKA-16157: I raised this yesterday and have addressed feedback from >> Luke. This should >> hopefully get merged soon. >> >> Regards, >> Gaurav >> >> >>> On 24 Jan 2024, at 11:51, ka...@gnarula.com wrote: >>> >>> Hi Stanislav, >>> >>> Thanks for bringing these JIRAs/PRs up. >>> >>> I'll be testing the open PRs for KAFKA-14616 and KAFKA-16162 this week and >>> I hope to have some feedback >>> by Friday. I gather the latter JIRA is marked as a WIP by Proven and he's >>> away. I'll try to build on his work in the meantime. >>> >>> As for KAFKA-16082, we haven't been able to deduce a data loss scenario. >>> There's a PR open >>> by me for promoting an abandoned future replica with approvals from Omnia >>> and Proven, >>> so I'd appreciate a committer reviewing it. >>> >>> Regards, >>> Gaurav >>> >>> On 23 Jan 2024, at 20:17, Stanislav Kozlovski >>> <stanis...@confluent.io.INVALID> wrote: >>>> >>>> Hey all, I figured I'd give an update about what known blockers we have >>>> right now: >>>> >>>> - KAFKA-16101: KRaft migration rollback documentation is incorrect - >>>> https://github.com/apache/kafka/pull/15193; This need not block RC >>>> creation, but we need the docs updated so that people can test properly >>>> - KAFKA-14616: Topic recreation with offline broker causes permanent URPs - >>>> https://github.com/apache/kafka/pull/15230 ; I am of the understanding that >>>> this is blocking JBOD for 3.7 >>>> - KAFKA-16162: New created topics are unavailable after upgrading to 3.7 - >>>> a strict blocker with an open PR https://github.com/apache/kafka/pull/15232 >>>> - although I understand Proveen is out of office >>>> - KAFKA-16082: JBOD: Possible dataloss when moving leader partition - I am >>>> hearing mixed opinions on whether this is a blocker ( >>>> https://github.com/apache/kafka/pull/15136) >>>> >>>> Given that there are 3 JBOD blocker bugs, and I am not confident they will >>>> all be merged this week - I am on the edge of voting to revert JBOD from >>>> this release, or mark it early access. >>>> >>>> By all accounts, it seems that if we keep with JBOD the release will have >>>> to spill into February, which is a month extra from the time-based release >>>> plan we had of start of January. >>>> >>>> Can I ask others for an opinion? >>>> >>>> Best, >>>> Stan >>>> >>>> On Thu, Jan 18, 2024 at 1:21 PM Luke Chen <show...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I think I've found another blocker issue: KAFKA-16162 >>>>> <https://issues.apache.org/jira/browse/KAFKA-16162> . >>>>> The impact is after upgrading to 3.7.0, any new created topics/partitions >>>>> will be unavailable. >>>>> I've put my findings in the JIRA. >>>>> >>>>> Thanks. >>>>> Luke >>>>> >>>>> On Thu, Jan 18, 2024 at 9:50 AM Matthias J. Sax <mj...@apache.org> wrote: >>>>> >>>>>> Stan, thanks for driving this all forward! Excellent job. >>>>>> >>>>>> About >>>>>> >>>>>>> StreamsStandbyTask - https://issues.apache.org/jira/browse/KAFKA-16141 >>>>>>> StreamsUpgradeTest - https://issues.apache.org/jira/browse/KAFKA-16139 >>>>>> >>>>>> For `StreamsUpgradeTest` it was a test setup issue and should be fixed >>>>>> now in trunk and 3.7 (and actually also in 3.6...) >>>>>> >>>>>> For `StreamsStandbyTask` the failing test exposes a regression bug, so >>>>>> it's a blocker. I updated the ticket accordingly. We already have an >>>>>> open PR that reverts the code introducing the regression. >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> On 1/17/24 9:44 AM, Proven Provenzano wrote: >>>>>>> We have another blocking issue for the RC : >>>>>>> https://issues.apache.org/jira/browse/KAFKA-16157. This bug is similar >>>>>> to >>>>>>> https://issues.apache.org/jira/browse/KAFKA-14616. The new issue >>>>> however >>>>>>> can lead to the new topic having partitions that a producer cannot >>>>> write >>>>>> to. >>>>>>> >>>>>>> --Proven >>>>>>> >>>>>>> On Tue, Jan 16, 2024 at 12:04 PM Proven Provenzano < >>>>>> pprovenz...@confluent.io> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> I have a PR https://github.com/apache/kafka/pull/15197 for >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16131 that is building >>>>> now. >>>>>>>> --Proven >>>>>>>> >>>>>>>> On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz <ja...@scholz.cz> wrote: >>>>>>>> >>>>>>>>> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a >>>>>>>>> blocker bug because it * >>>>>>>>> *> will generate huge amount of logspam. I guess we didn't find it in >>>>>>>>> junit >>>>>>>>> tests * >>>>>>>>> *> since logspam doesn't fail the automated tests. But certainly it's >>>>>> not >>>>>>>>> suitable * >>>>>>>>> *> for production. Did you file a JIRA yet?* >>>>>>>>> >>>>>>>>> Hi Colin, >>>>>>>>> >>>>>>>>> I opened https://issues.apache.org/jira/browse/KAFKA-16131. >>>>>>>>> >>>>>>>>> Thanks & Regards >>>>>>>>> Jakub >>>>>>>>> >>>>>>>>> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe <cmcc...@apache.org> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Stanislav, >>>>>>>>>> >>>>>>>>>> Thanks for making the first RC. The fact that it's titled RC2 is >>>>>> messing >>>>>>>>>> with my mind a bit. I hope this doesn't make people think that we're >>>>>>>>>> farther along than we are, heh. >>>>>>>>>> >>>>>>>>>> On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote: >>>>>>>>>>> *> Nice catch! It does seem like we should have gated this behind >>>>> the >>>>>>>>>>> metadata> version as KIP-858 implies. Is the cluster configured >>>>> with >>>>>>>>>>> multiple log> dirs? What is the impact of the error messages?* >>>>>>>>>>> >>>>>>>>>>> I did not observe any obvious impact. I was able to send and >>>>> receive >>>>>>>>>>> messages as normally. But to be honest, I have no idea what else >>>>>>>>>>> this might impact, so I did not try anything special. >>>>>>>>>>> >>>>>>>>>>> I think everyone upgrading an existing KRaft cluster will go >>>>> through >>>>>>>>> this >>>>>>>>>>> stage (running Kafka 3.7 with an older metadata version for at >>>>> least >>>>>> a >>>>>>>>>>> while). So even if it is just a logged exception without any other >>>>>>>>>> impact I >>>>>>>>>>> wonder if it might scare users from upgrading. But I leave it to >>>>>>>>> others >>>>>>>>>> to >>>>>>>>>>> decide if this is a blocker or not. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Jakub, >>>>>>>>>> >>>>>>>>>> Thanks for trying the RC. I think what you found is a blocker bug >>>>>>>>> because >>>>>>>>>> it will generate huge amount of logspam. I guess we didn't find it >>>>> in >>>>>>>>> junit >>>>>>>>>> tests since logspam doesn't fail the automated tests. But certainly >>>>>> it's >>>>>>>>>> not suitable for production. Did you file a JIRA yet? >>>>>>>>>> >>>>>>>>>>> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski >>>>>>>>>>> <stanis...@confluent.io.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Luke, >>>>>>>>>>>> >>>>>>>>>>>> This is an interesting problem. Given the fact that the KIP for >>>>>>>>> having a >>>>>>>>>>>> 3.8 release passed, I think it weights the scale towards not >>>>> calling >>>>>>>>>> this a >>>>>>>>>>>> blocker and expecting it to be solved in 3.7.1. >>>>>>>>>>>> >>>>>>>>>>>> It is unfortunate that it would not seem safe to migrate to KRaft >>>>> in >>>>>>>>>> 3.7.0 >>>>>>>>>>>> (given the inability to rollback safely), but if that's true - the >>>>>>>>> same >>>>>>>>>>>> case would apply for 3.6.0. So in any case users w\ould be >>>>> expected >>>>>>>>> to >>>>>>>>>> use a >>>>>>>>>>>> patch release for this. >>>>>>>>>> >>>>>>>>>> Hi Luke, >>>>>>>>>> >>>>>>>>>> Thanks for testing rollback. I think this is a case where the >>>>>>>>>> documentation is wrong. The intention was to for the steps to >>>>>> basically >>>>>>>>> be: >>>>>>>>>> >>>>>>>>>> 1. roll all the brokers into zk mode, but with migration enabled >>>>>>>>>> 2. take down the kraft quorum >>>>>>>>>> 3. rmr /controller, allowing a hybrid broker to take over. >>>>>>>>>> 4. roll all the brokers into zk mode without migration enabled (if >>>>>>>>> desired) >>>>>>>>>> >>>>>>>>>> With these steps, there isn't really unavailability since a ZK >>>>>>>>> controller >>>>>>>>>> can be elected quickly after the kraft quorum is gone. >>>>>>>>>> >>>>>>>>>>>> Further, since we will have a 3.8 release - it is >>>>>>>>>>>> likely we will ultimately recommend users upgrade from that >>>>> version >>>>>>>>>> given >>>>>>>>>>>> its aim is to have strategic KRaft feature parity with ZK. >>>>>>>>>>>> That being said, I am not 100% on this. Let me know whether you >>>>>> think >>>>>>>>>> this >>>>>>>>>>>> should block the release, Luke. I am also tagging Colin and David >>>>> to >>>>>>>>>> weigh >>>>>>>>>>>> in with their opinions, as they worked on the migration logic. >>>>>>>>>> >>>>>>>>>> The rollback docs are new in 3.7 so the fact that they're wrong is a >>>>>>>>> clear >>>>>>>>>> blocker, I think. But easy to fix, I believe. I will create a PR. >>>>>>>>>> >>>>>>>>>> best, >>>>>>>>>> Colin >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hey Kirk and Chris, >>>>>>>>>>>> >>>>>>>>>>>> Unless I'm missing something - KAFKALESS-16029 is simply a bad log >>>>>>>>> due >>>>>>>>>> to >>>>>>>>>>>> improper closing. And the PR description implies this has been >>>>>>>>> present >>>>>>>>>>>> since 3.5. While annoying, I don't see a strong reason for this to >>>>>>>>> block >>>>>>>>>>>> the release. >>>>>>>>>>>> >>>>>>>>>>>> Hey Jakub, >>>>>>>>>>>> >>>>>>>>>>>> Nice catch! It does seem like we should have gated this behind the >>>>>>>>>> metadata >>>>>>>>>>>> version as KIP-858 implies. Is the cluster configured with >>>>> multiple >>>>>>>>> log >>>>>>>>>>>> dirs? What is the impact of the error messages? >>>>>>>>>>>> >>>>>>>>>>>> Tagging Igor (the author of the KIP) to weigh in. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Stanislav >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz <ja...@scholz.cz> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I was trying the RC2 and run into the following issue ... when I >>>>>>>>> run >>>>>>>>>>>>> 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 >>>>>>>>> metadata >>>>>>>>>>>>> version, I seem to be getting repeated errors like this in the >>>>>>>>>> controller >>>>>>>>>>>>> logs: >>>>>>>>>>>>> >>>>>>>>>>>>> 2024-01-13 16:58:01,197 INFO [QuorumController id=0] >>>>>>>>>>>> assignReplicasToDirs: >>>>>>>>>>>>> event failed with UnsupportedVersionException in 15 microseconds. >>>>>>>>>>>>> (org.apache.kafka.controller.QuorumController) >>>>>>>>>>>>> [quorum-controller-0-event-handler] >>>>>>>>>>>>> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] >>>>> Unexpected >>>>>>>>>> error >>>>>>>>>>>>> handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, >>>>>>>>>>>>> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) >>>>> -- >>>>>>>>>>>>> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5, >>>>>>>>>>>>> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ, >>>>>>>>>>>>> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ, >>>>>>>>>>>>> partitions=[PartitionData(partitionIndex=2), >>>>>>>>>>>>> PartitionData(partitionIndex=1)]), >>>>>>>>>>>>> TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ, >>>>>>>>>>>>> partitions=[PartitionData(partitionIndex=0)])])]) with context >>>>>>>>>>>>> >>>>> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS, >>>>>>>>>>>>> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2), >>>>>>>>>>>>> connectionId='172.16.14.219:9090-172.16.14.217:53590-7', >>>>>>>>>> clientAddress=/ >>>>>>>>>>>>> 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi, >>>>>>>>>>>>> listenerName=ListenerName(CONTROLPLANE-9090), >>>>> securityProtocol=SSL, >>>>>>>>>>>>> >>>>> clientInformation=ClientInformation(softwareName=apache-kafka-java, >>>>>>>>>>>>> softwareVersion=3.7.0), fromPrivilegedListener=false, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2 >>>>>>>>>>>>> ]) >>>>>>>>>>>>> (kafka.server.ControllerApis) [quorum-controller-0-event-handler] >>>>>>>>>>>>> java.util.concurrent.CompletionException: >>>>>>>>>>>>> org.apache.kafka.common.errors.UnsupportedVersionException: >>>>>>>>> Directory >>>>>>>>>>>>> assignment is not supported yet. >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) >>>>>>>>>>>>> at >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) >>>>>>>>>>>>> at java.base/java.lang.Thread.run(Thread.java:840) >>>>>>>>>>>>> >>>>>>>>>>>>> Caused by: >>>>>>>>> org.apache.kafka.common.errors.UnsupportedVersionException: >>>>>>>>>>>>> Directory assignment is not supported yet. >>>>>>>>>>>>> >>>>>>>>>>>>> Is that expected? I guess with the metadata version set to >>>>>>>>> 3.6-IV2, it >>>>>>>>>>>>> makes sense that the request is not supported. But shouldn't then >>>>>>>>> the >>>>>>>>>>>>> request not be sent at all by the brokers? (I did not opened a >>>>> JIRA >>>>>>>>>> for >>>>>>>>>>>> it, >>>>>>>>>>>>> but I can open one if you agree this is not expected) >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks & Regards >>>>>>>>>>>>> Jakub >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Jan 13, 2024 at 8:03 AM Luke Chen <show...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Stanislav, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I commented in the "Apache Kafka 3.7.0 Release" thread, but >>>>> maybe >>>>>>>>>> you >>>>>>>>>>>>>> missed it. >>>>>>>>>>>>>> cross-posting here: >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is a bug KAFKA-16101 >>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-16101> reporting >>>>>>>>> that >>>>>>>>>>>>> "Kafka >>>>>>>>>>>>>> cluster will be unavailable during KRaft migration rollback". >>>>>>>>>>>>>> The impact for this issue is that if brokers try to rollback to >>>>>>>>> ZK >>>>>>>>>> mode >>>>>>>>>>>>>> during KRaft migration process, there will be a period of time >>>>>>>>> the >>>>>>>>>>>>> cluster >>>>>>>>>>>>>> is unavailable. >>>>>>>>>>>>>> Since ZK migrating to KRaft feature is a production ready >>>>>>>>> feature, I >>>>>>>>>>>>> think >>>>>>>>>>>>>> this should be addressed soon. >>>>>>>>>>>>>> Do you think this is a blocker for v3.7.0? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> Luke >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Jan 13, 2024 at 8:36 AM Chris Egerton < >>>>>>>>>> fearthecel...@gmail.com >>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, Kirk! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @Stanislav--do you believe that this warrants a new RC? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jan 12, 2024, 19:08 Kirk True <k...@kirktrue.pro> >>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Chris/Stanislav, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm working on the 'Unable to find FetchSessionHandler' log >>>>>>>>>> problem >>>>>>>>>>>>>>>> (KAFKA-16029) and have put out a draft PR ( >>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/15186). I will use the >>>>>>>>>>>>> quickstart >>>>>>>>>>>>>>>> approach as a second means to reproduce/verify while I wait >>>>>>>>> for >>>>>>>>>> the >>>>>>>>>>>>>> PR's >>>>>>>>>>>>>>>> Jenkins job to finish. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Kirk >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jan 12, 2024, at 11:31 AM, Chris Egerton wrote: >>>>>>>>>>>>>>>>> Hi Stanislav, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for running this release! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To verify, I: >>>>>>>>>>>>>>>>> - Built from source using Java 11 with both: >>>>>>>>>>>>>>>>> - - the 3.7.0-rc2 tag on GitHub >>>>>>>>>>>>>>>>> - - the kafka-3.7.0-src.tgz artifact from >>>>>>>>>>>>>>>>> >>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ >>>>>>>>>>>>>>>>> - Checked signatures and checksums >>>>>>>>>>>>>>>>> - Ran the quickstart using both: >>>>>>>>>>>>>>>>> - - The kafka_2.13-3.7.0.tgz artifact from >>>>>>>>>>>>>>>>> >>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ >>>>>>>>>>>> with >>>>>>>>>>>>>> Java >>>>>>>>>>>>>>>> 11 >>>>>>>>>>>>>>>>> and Scala 13 in KRaft mode >>>>>>>>>>>>>>>>> - - Our shiny new broker Docker image, >>>>>>>>> apache/kafka:3.7.0-rc2 >>>>>>>>>>>>>>>>> - Ran all unit tests >>>>>>>>>>>>>>>>> - Ran all integration tests for Connect and MM2 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I found two minor areas for concern: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. (Possibly a blocker) >>>>>>>>>>>>>>>>> When running the quickstart, I noticed this ERROR-level log >>>>>>>>>>>> message >>>>>>>>>>>>>>> being >>>>>>>>>>>>>>>>> emitted frequently (not not every time) when I killed my >>>>>>>>>> console >>>>>>>>>>>>>>> consumer >>>>>>>>>>>>>>>>> via ctrl-C: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [2024-01-12 11:00:31,088] ERROR [Consumer >>>>>>>>>>>>>> clientId=console-consumer, >>>>>>>>>>>>>>>>> groupId=console-consumer-74388] Unable to find >>>>>>>>>>>> FetchSessionHandler >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>> 1. Ignoring fetch response >>>>>>>>>>>>>>>>> (org.apache.kafka.clients.consumer.internals.AbstractFetch) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I see that this error message is already reported in >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-16029. I >>>>>>>>> think we >>>>>>>>>>>>> should >>>>>>>>>>>>>>>>> prioritize fixing it for this release. I know it's probably >>>>>>>>>>>> benign >>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>> really not a good look for us when basic operations log >>>>>>>>> error >>>>>>>>>>>>>> messages, >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> it may give new users some headaches. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. (Probably not a blocker) >>>>>>>>>>>>>>>>> The following unit tests failed the first time around, and >>>>>>>>>> all of >>>>>>>>>>>>>> them >>>>>>>>>>>>>>>>> passed the second time I ran them: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - (clients) >>>>>>>>>>>>>>>> >>>>>>>>> ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup() >>>>>>>>>>>>>>>>> - (clients) SelectorTest.testConnectionsByClientMetric() >>>>>>>>>>>>>>>>> - (clients) >>>>>>>>> Tls13SelectorTest.testConnectionsByClientMetric() >>>>>>>>>>>>>>>>> - (connect) >>>>>>>>>>>>>> TopicAdminTest.retryEndOffsetsShouldRetryWhenTopicNotFound >>>>>>>>>>>>>>> (I >>>>>>>>>>>>>>>>> thought I fixed this one! 🤬🤬) >>>>>>>>>>>>>>>>> - (core) >>>>>>>>>> ProducerIdManagerTest.testUnrecoverableErrors(Errors)[2] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks again for your work on this release, and >>>>>>>>>> congratulations >>>>>>>>>>>> to >>>>>>>>>>>>>>> Kafka >>>>>>>>>>>>>>>>> Streams for having zero flaky unit tests during my >>>>>>>>>>>>>> highly-experimental >>>>>>>>>>>>>>>>> single laptop run! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jan 11, 2024 at 1:33 PM Stanislav Kozlovski >>>>>>>>>>>>>>>>> <stanis...@confluent.io.invalid> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello Kafka users, developers, and client-developers, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This is the first candidate for release of Apache Kafka >>>>>>>>>> 3.7.0. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Note it's named "RC2" because I had a few "failed" RCs >>>>>>>>> that >>>>>>>>>> I >>>>>>>>>>>> had >>>>>>>>>>>>>>>>>> cut/uploaded but ultimately had to scrap prior to >>>>>>>>> announcing >>>>>>>>>>>> due >>>>>>>>>>>>> to >>>>>>>>>>>>>>> new >>>>>>>>>>>>>>>>>> blockers arriving before I could even announce them. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Further - I haven't yet been able to set up the system >>>>>>>>> tests >>>>>>>>>>>>>>>> successfully. >>>>>>>>>>>>>>>>>> And the integration/unit tests do have a few failures >>>>>>>>> that I >>>>>>>>>>>> have >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> spend >>>>>>>>>>>>>>>>>> time triaging. I would appreciate any help in case anyone >>>>>>>>>>>> notices >>>>>>>>>>>>>> any >>>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>>>>> failing that they're subject matters experts in. Expect >>>>>>>>> me >>>>>>>>>> to >>>>>>>>>>>>>> follow >>>>>>>>>>>>>>>> up in >>>>>>>>>>>>>>>>>> a day or two with more detailed analysis. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Major changes include: >>>>>>>>>>>>>>>>>> - Early Access to KIP-848 - the next generation of the >>>>>>>>>> consumer >>>>>>>>>>>>>>>> rebalance >>>>>>>>>>>>>>>>>> protocol >>>>>>>>>>>>>>>>>> - KIP-858: Adding JBOD support to KRaft >>>>>>>>>>>>>>>>>> - KIP-714: Observability into Client metrics via a >>>>>>>>>> standardized >>>>>>>>>>>>>>>> interface >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Check more information in the WIP blog post: >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka-site/pull/578 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Release notes for the 3.7.0 release: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/RELEASE_NOTES.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> *** Please download, test and vote by Thursday, January >>>>>>>>> 18, >>>>>>>>>> 9am >>>>>>>>>>>>> PT >>>>>>>>>>>>>>> *** >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Usually these deadlines tend to be 2-3 days, but due to >>>>>>>>> this >>>>>>>>>>>>> being >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> first RC and the tests not having ran yet, I am giving >>>>>>>>> it a >>>>>>>>>> bit >>>>>>>>>>>>>> more >>>>>>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Kafka's KEYS file containing PGP keys we use to sign the >>>>>>>>>>>> release: >>>>>>>>>>>>>>>>>> https://kafka.apache.org/KEYS >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Release artifacts to be voted upon (source and binary): >>>>>>>>>>>>>>>>>> >>>>>>>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Docker release artifact to be voted upon: >>>>>>>>>>>>>>>>>> apache/kafka:3.7.0-rc2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Maven artifacts to be voted upon: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>> https://repository.apache.org/content/groups/staging/org/apache/kafka/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Javadoc: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>> https://home.apache.org/~stanislavkozlovski/kafka-3.7.0-rc2/javadoc/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Tag to be voted upon (off 3.7 branch) is the 3.7.0 tag: >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/releases/tag/3.7.0-rc2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Documentation: >>>>>>>>>>>>>>>>>> https://kafka.apache.org/37/documentation.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Protocol: >>>>>>>>>>>>>>>>>> https://kafka.apache.org/37/protocol.html >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Successful Jenkins builds for the 3.7 branch: >>>>>>>>>>>>>>>>>> Unit/integration tests: >>>>>>>>>>>>>>>>>> >>>>>>>>>> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.7/58/ >>>>>>>>>>>>>>>>>> There are failing tests here. I have to follow up with >>>>>>>>>> triaging >>>>>>>>>>>>>> some >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> the failures and figuring out if they're actual problems >>>>>>>>> or >>>>>>>>>>>>> simply >>>>>>>>>>>>>>>> flakes. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> System tests: >>>>>>>>>>>>>>>> https://jenkins.confluent.io/job/system-test-kafka/job/3.7/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> No successful system test runs yet. I am working on >>>>>>>>> getting >>>>>>>>>> the >>>>>>>>>>>>> job >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> run. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * Successful Docker Image Github Actions Pipeline for 3.7 >>>>>>>>>>>> branch: >>>>>>>>>>>>>>>>>> Attached are the scan_report and report_jvm output files >>>>>>>>>> from >>>>>>>>>>>> the >>>>>>>>>>>>>>>> Docker >>>>>>>>>>>>>>>>>> Build run: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>> https://github.com/apache/kafka/actions/runs/7486094960/job/20375761673 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> And the final docker image build job - Docker Build Test >>>>>>>>>>>>> Pipeline: >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/actions/runs/7486178277 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The image is apache/kafka:3.7.0-rc2 - >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> https://hub.docker.com/layers/apache/kafka/3.7.0-rc2/images/sha256-5b4707c08170d39549fbb6e2a3dbb83936a50f987c0c097f23cb26b4c210c226?context=explore >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /************************************** >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Stanislav Kozlovski >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best, >>>>>>>>>>>> Stanislav >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best, >>>> Stanislav >>> >> >