davidov541 commented on a change in pull request #933: HIVE-21218: Adding support for Confluent Kafka Avro message format URL: https://github.com/apache/hive/pull/933#discussion_r387980822
########## File path: kafka-handler/README.md ########## @@ -50,6 +50,9 @@ ALTER TABLE SET TBLPROPERTIES ( "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); ``` + +If you use Confluent Avro serialzier/deserializer with Schema Registry you may want to remove 5 bytes from beginning that represents magic byte + schema ID from registry. Review comment: @b-slim You appear to be correct, based on the source code: https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java. Let's say we did implement as is, and later we implement the schema registry lookup and use the same identifier? Who would that break? Serialized messages that point to a bogus schema registry instance, or serialized messages that happened to need 5 bytes at the front of the message, but aren't from confluent, and some clever dev figured out he could use Confluent instead of the right way? The second case doesn't matter to me tbh. The first case is concerning and should be handled. I would expect that we would catch when we can't find a schema and print out a warning, but no error. That would allow this case to continue working. But we would be making assumptions on the implementation of a feature in the future, which is always a crapshoot... To be clear, if we make sure documentation is clear on this outside of just these parameters, and @cricket007 agrees with it as a heavy Confluent user, then I'm fine with it. It feels like we've covered this problem enough to be in a good spot either way. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org