linliu-code opened a new pull request, #17899:
URL: https://github.com/apache/hudi/pull/17899

   ### Change Logs
   This pr https://github.com/apache/hudi/pull/9743 adds more schema evolution 
functionality and schema processing. However, we used the InternalSchema system 
to do various operations such as fix null ordering, reorder, and add columns. 
At the time, InternalSchema only had a single Timestamp type. When converting 
back to avro, this was assumed to be micros. Therefore, if the schema provider 
had any millis columns, the processed schema would end up with those columns as 
micros.
   
   In this pr to update column stats with better support for logical types: 
https://github.com/apache/hudi/pull/13711, the schema issues were fixed, as 
well as additional issues with handling and conversion of timestamps during 
ingestion.
   
   this pr aims to add functionality to spark and hive readers and writers to 
automatically repair affected tables.
   After switching to use the 1.1 binary, the affected columns will undergo 
evolution from timestamp-micros to timestamp-mills. Normally a lossy evolution 
that is not supported, this evolution is ok because the data is actually still 
timestamp-millis it is just mislabeled as micros in the parquet and table 
schemas
   
   ### Impact
   When reading from a hudi table using spark or hive reader if the table 
schema has a column as millis, but the data schema is micros, we will assume 
that this column is affected and read it as a millis value instead of a micros 
value. This correction is also applied to all readers that the default write 
paths use. As a table is rewritten the parquet files will be correct. A table's 
latest snapshot can be immediately fixed by writing one commit with the 1.1 
binary, and then clustering the entire table.
   
   ### Risk level (write none, low medium or high below)
   High,
   extensive testing was done and functional tests were added.
   
   #### Documentation Update
   https://github.com/apache/hudi/pull/14100
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to