Hi Matthew, I have not read into the details of your issues but have done similar "rolling" upgrade testing myself. The reason replication breaks is due to some wire protocol changes.
Just checking some preliminary things before digging in - Have you followed the upgrade steps outlined here? - https://kafka.apache.org/090/documentation.html#upgrade - Does setting inter.broker.protocol.version=0.8.2.X resolve the issue? - Note: you need to unset and restart again after all brokers are upgraded. In the future KIP-35 may help alleviate the manual step of setting the inter.broker.protocol.version. You can read more about KIP-35 and participate in the discussion/design here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version Thanks, Grant On Thu, Nov 5, 2015 at 2:18 PM, Matthew Bruce <mbr...@blackberry.com> wrote: > Hello Kafka Devs, > > I've been testing the upgrade procedure between Kafka 0.8.2.1 and Kafka > 0.9.0.0 and have been having Replication issues between the two version, > and I was wondering if anyone was aware of this issue (I just searched and > this seems to be related to KAFKA-2750 raised yesterday ). > > I start with 3 brokers running 0.8.2.1 all that contain data (1 topic with > 10 partitions), then I shut down one of the brokers, upgrade it to 0.9.0 > (making sure to set 'inter.broker.protocol.version=0.8.2.X' in > broker.properties). Once the Broker is started I see errors like the > following: > > [2015-11-05 19:13:10,309] WARN [ReplicaFetcherThread-0-182050600], Error > in fetch kafka.server.ReplicaFetcherThread$FetchRequest@6cc18858<mailto: > kafka.server.ReplicaFetcherThread$FetchRequest@6cc18858>. Possible cause: > org.apache.kafka.common.protocol.types.SchemaException: Error reading field > 'responses': Error reading field 'topic': java.nio.BufferUnderflowException > (kafka.server.ReplicaFetcherThread) > And > [2015-11-03 16:55:15,178] WARN [ReplicaFetcherThread-1-182050600], Error > in fetch kafka.server.ReplicaFetcherThread$FetchRequest@224388b2<mailto: > kafka.server.ReplicaFetcherThread$FetchRequest@224388b2>. Possible cause: > org.apache.kafka.common.protocol.types.SchemaException: Error reading field > 'responses': Error reading field 'partition_responses': Error reading field > 'record_set': java.lang.IllegalArgumentException > (kafka.server.ReplicaFetcherThread) > > > I've spent some time in the Kafka code, and packet captures/wireshark > trying to figure this out, and I believe there is an issue in > org.apache.kafka.clients.networkClient.java in the handleCompletedReceives > function: > When extracting the response body, this function is using > ProtoUtils.currentResponseSchema instead of ProtoUtils.ResponseSchema and > specifying the API version required by inter.broker.protocol.version. > Struct body = (Struct) > ProtoUtils.currentResponseSchema(apiKey).read(receive.payload()); > > This results in errors when the newer version of a Schema > (FETCH_RESPONSE_V1 instead of FETCH_RESPONSE_V0) is applied against the > fetch response returned by the 0.8.2.1 broker > > > Thanks, > Matthew Bruce > mbr...@blackberry.com<mailto:mbr...@blackberry.com> > > -- Grant Henke Software Engineer | Cloudera gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke