Limess edited a comment on issue #4043: URL: https://github.com/apache/hudi/issues/4043#issuecomment-974848916
`_hoodie_is_deleted` is the added column, sorry I misquoted it above. I believe this should have been added to the end of the schema, although I'm not sure how the ordering will have actually ended up in Hudi given we added this column in the source parquet files at the end, and add `story_published_partition_date` in the SQL transformer (`SELECT * story_published_partition_date as story_published_date FROM <src>`) - maybe this use of the SQL transformer is a mistake and we should be explicitly ordering columns? i.e. the process: 1. Writer (2) adds the `_hoodie_is_deleted` column to the end of it's schema (parquet) 2. Hudi deltastreamer loads the column into the target table 3. Writer (1) starts failing as above (the column `_hoodie_is_deleted` does not exist in it's source parquet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
