nsivabalan edited a comment on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-825077766
I spent sometime to understand this PR. thanks for putting it up @sathyaprakashg. I have few clarifications. 1. Can you fix the description wrt latest status. I don't see SchemaBasedSchemaProvider etc. 2. FYI We landed a [fix](https://github.com/apache/hudi/pull/2765) wrt default vals and null in unions. If incase, the schema post processing is not required at all w/ this fix, it would simplify things. Guess the namespace fix in this PR may not be required if the post processing step is not required. @bvaradar @n3nash : can you folks chime in here please. [fixed datatype jira](https://issues.apache.org/jira/browse/HUDI-1607). 3. Also, I pulled the test locally and was trying to verify things. Looks like the test is not generating records as intended in 3rd step. Here is what is happening. - TestDataSource generates data w/ intended schema(old) - But in SourceFormatAdapter, when we do AvroConversionUtils.createDataFrame(...), evolved schema is passed in. and so InputBatch<Dataset<Row>> returned from here has new column set to null for all records. - I also verified this from within the IdentityTransformer which was showing evolved schema and record having new column as well. so, essentially the test also need to be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
