voonhous commented on issue #7444: URL: https://github.com/apache/hudi/issues/7444#issuecomment-1353610958
@codope While this issue can be fixed with the 2 parameters provided above, there is a possibility that implicit schema changes can still be with the default parameter values (2 parameters set to false). I do believe this is not a "proper" fix for such cases. Say if these implicit schema changes have already been written to the table, there might not be any recourse that users can do to "fix" the table. I believe the proper way of fixing this issue is to: 1. Enable these 2 parameters by default (Requires #6358 and it's accompanying fixes) 2. Should there be any implicit schema changes detected, enable these 2 parameters (Requires #6358 and it's accompanying fixes) 3. Prevent implicit changes if these 2 parameters are not enabled (Requires #6358 and it's accompanying fixes) 4. Modify SparkXXParquetFileFormat.scala to handle these type changes when reading I currently using approach (4) and will raise a PR for review for it tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
