nbalajee commented on pull request #2309: URL: https://github.com/apache/hudi/pull/2309#issuecomment-743371048
> @nbalajee Can you please explain why do we need this ? If the latest schema is passed (which is the case for Hudi now) is this still a problem ? > @bvaradar can you please take a look at this one ? @n3nash - Correct. When reading the parquet files, Hudi uses the writer schema (evolved schema with added fields) so that optional fields are automatically populated with null (native schema evolution). For the rewrite(), Hudi use-cases always pass the writerSchema, so we don't run into this issue. Added advantage of fixing this the correct way is that Hudi will be able to support "external schema evolution". (Read parquet using the reader schema, then rewrite the records using the evolved schema). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
