brandon-stanley opened a new issue #2331:
URL: https://github.com/apache/hudi/issues/2331


   Hi Hudi Team! I have a question about field deletions/schema evolution. The 
[FAQ Documentation](https://cwiki.apache.org/confluence/display/HUDI/FAQ) 
states the following:
   
   > Hudi uses Avro as the internal canonical representation for records, 
primarily due to its nice [schema compatibility & evolution 
properties](https://docs.confluent.io/platform/current/schema-registry/avro.html).
 This is a key aspect of having reliability in your ingestion or ETL pipelines. 
As long as the schema passed to Hudi (either explicitly in DeltaStreamer schema 
provider configs or implicitly by Spark Datasource's Dataset schemas) is 
backwards compatible (e.g no field deletes, only appending new fields to 
schema), Hudi will seamlessly handle read/write of old and new data and also 
keep the Hive schema up-to date.
   
   
   While reading the [Confluent 
Documentation](https://docs.confluent.io/platform/current/schema-registry/avro.html)
 that is linked above, I noticed that "Delete fields" is an "allowed change"  
for `BACKWARDS` compatible schemas. I assume that the Avro schema that is 
tracked within Hudi is `BACKWARDS` compatible and therefore should allow field 
deletions, but the [FAQ 
Documentation](https://cwiki.apache.org/confluence/display/HUDI/FAQ) states 
otherwise. Can you please clarify the following:
   
   1. Why field deletions are not supported within Hudi?
   2. Is there is a way to determine (and possibly update) the Avro schema 
compatibility type for a Hudi table?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to