nsivabalan edited a comment on pull request #2380:
URL: https://github.com/apache/hudi/pull/2380#issuecomment-751280380


   @afilipchik @vinothchandar : I have taken a stab at 
https://github.com/apache/hudi/pull/1565. I did not have permission to update 
Pratyaksh's repo, hence created a new one. 
   
   Basically, AbstractHoodieKafkaAvroDeserializer initializes SchemaProvider 
based on configs to fetch source scheme and target schema. In other words I 
have combined https://github.com/apache/hudi/pull/1562 and 
https://github.com/apache/hudi/pull/1565
   
   If deserialize() is called w/ reader schema, same is used. If not, the one 
from schema provider is used. In either case, writer schema is fetched from 
schema provider. In previous patch from Pratyaksh, we were using the passed in 
schema as both reader and writer schema and hence schema evolution could run 
into issues. 
   
   But AbstractHoodieKafkaAvroDeserializer is inspired from Confluent 
schema-registry repo. I am not sure how to make this generic(as of now assumes 
the schema id at the beginning, followed by length and data). I haven't worked 
w/ schema registries nor w/ kafka/avro before. Will have to do research on what 
other ways we could deser kafka avro data. But as per Pratyaksh's comment, 
looks like usage of non schema registry flows are discouraged in general. So, 
not sure how much value we could add by supporting all diff ways to deser kafka 
avro data (i.e if not for confluent way). 
   
   Let me know your thoughts. I am looking to get this into 0.7.0 (will be 
cutting a release in a weeks time). So, would appreciate if you can respond 
whenever you can. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to