nsivabalan commented on pull request #2380: URL: https://github.com/apache/hudi/pull/2380#issuecomment-751280380
@afilipchik @vinothchandar : I have taken a stab at https://github.com/apache/hudi/pull/1565 Basically, AbstractHoodieKafkaAvroDeserializer initializes SchemaProvider based on configs to fetch source scheme and target schema. In other words I have combined https://github.com/apache/hudi/pull/1562 and https://github.com/apache/hudi/pull/1565 If deserialize() is called w/ reader schema, same is used. If not, the one from schema provider is used. In either case, writer schema is fetched from schema provider. In previous patch from Pratyaksh, we were using the passed in schema as both reader and writer schema and hence schema evolution could run into issues. But AbstractHoodieKafkaAvroDeserializer is inspired from Confluent schema-registry repo. I am not sure how to make this generic(as of now assumes the schema id at the beginning, followed by length and data). I haven't worked w/ schema registries nor w/ kafka/avro before. Will have to do research on what other ways we could deser kafka avro data. But as per Pratyaksh's comment, looks like usage of non schema registry flows are discouraged in general. So, not sure how much value we could add by supporting all diff ways to deser kafka avro data (i.e if not for confluent way). Let me know your thoughts. I am looking to get this into 0.7.0 (will be cutting a release in a weeks time). So, would appreciate if you can respond whenever you can. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
