brandon-stanley opened a new issue #2331: URL: https://github.com/apache/hudi/issues/2331
Hi Hudi Team! I have a question about field deletions/schema evolution. The [FAQ Documentation](https://cwiki.apache.org/confluence/display/HUDI/FAQ) states the following: > Hudi uses Avro as the internal canonical representation for records, primarily due to its nice [schema compatibility & evolution properties](https://docs.confluent.io/platform/current/schema-registry/avro.html). This is a key aspect of having reliability in your ingestion or ETL pipelines. As long as the schema passed to Hudi (either explicitly in DeltaStreamer schema provider configs or implicitly by Spark Datasource's Dataset schemas) is backwards compatible (e.g no field deletes, only appending new fields to schema), Hudi will seamlessly handle read/write of old and new data and also keep the Hive schema up-to date. While reading the [Confluent Documentation](https://docs.confluent.io/platform/current/schema-registry/avro.html) that is linked above, I noticed that "Delete fields" is an "allowed change" for `BACKWARDS` compatible schemas. I assume that the Avro schema that is tracked within Hudi is `BACKWARDS` compatible and therefore should allow field deletions, but the [FAQ Documentation](https://cwiki.apache.org/confluence/display/HUDI/FAQ) states otherwise. Can you please clarify the following: 1. Why field deletions are not supported within Hudi? 2. Is there is a way to determine (and possibly update) the Avro schema compatibility type for a Hudi table? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
