nsivabalan edited a comment on pull request #2654:
URL: https://github.com/apache/hudi/pull/2654#issuecomment-801212605


   @n3nash @vinothchandar : Looks like we can't really allow NullSchemaProvider 
in our delta schema flow. We have 
[SparkAvroPostProcessor](https://github.com/apache/hudi/blob/968488fa3a3c67962ed5b60e00836e71c730a9a5/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SparkAvroPostProcessor.java#L42)
 which will try to convert schema to spark sql schema and back. So, if user 
passed in Null schema (Schema.create(Schema.Type.NULL)), we will run into NPE 
issues. 
   
   So, re-cap. 
   - if entire schema provider is null, we are good. if schema provider returns 
null for schema (getTargetSchema() == null), we fallback to 
RowBasedSchemaProvider. 
   - but if users passes in a NullSchemaProvider which will return 
Schema.create(Schema.Type.NULL) for target schema, I am not sure if we should 
make fixes to allow this flow. 
   
   Let me know if this sounds reasonable. we can close this out and inform 
customer to not set any schema provider only. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to