Armelabdelkbir opened a new issue, #6496: URL: https://github.com/apache/hudi/issues/6496
Hi everyone, i'm trying to test schema evolution for my cdc pipline (debizium + kafka) with hudi 0.11.0 and spark structured streming , i follow this documentation, https://hudi.apache.org/docs/0.11.0/schema_evolution, does hudi manage well the schema evolution, it is necessary to restart the job, once it is done all the old values will be null, it takes into account only the values of the last commits, so the data not match my Postgres source ? on my schema registry i can see V1 and V2 any ideas thanks A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. Stop hudi streams, and drop hive tables 2. add some columns ALTER TABLE <table> ADD COLUMN <column_name> character varying(50) DEFAULT 'toto' ; 3. restart hudi spark jobs 4. select * from hudi _ro / _rt table ( or read parquet hudi format using spark) **Expected behavior** when i select my data it expected to see default value on the added column and not null values data on postgres source: ``` cdc_hudi=> select test, test2, test3 from hudipart ; test | test2 | test3 ------+-------+------- toto | f | Toto test | t | Toto test | t | Toto test | t | Toto toto | f | Toto toto | f | Toto toto | f | Toto toto | f | Toto toto | f | Toto toto | f | Toto toto | f | Toto test | t | Toto test | t | Toto test | t | test3 test | t | test3 toto | f | Toto toto | f | Toto toto | f | Toto test | t | test3 toto | f | Toto toto | f | Toto toto | f | Toto ``` data on hudi parquets / hive tables : ``` spark.sql("select _hoodie_commit_time as commitTime, test, test2, test3 from evolution ").show() --------------------------------------- +-----------------+----+-----+-----+ | commitTime|test|test2|test3| +-----------------+----+-----+-----+ |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824102514494|null| null| null| |20220824132039517|null| null| null| |20220824132113066|null| null| null| |20220824132113066|null| null| null| |20220824132934016|test| true| null| |20220824135050368|test| true| null| |20220824135411903|test| true| null| |20220824135446080|test| true| null| |20220824135921176|test| true|test3| ``` **Environment Description** * Hudi version : 0.11.0 * Spark version :3.1.4 * Hive version :1.2.1000 * Hadoop version : 2.7.3 * Storage (HDFS) * schema registry * Kafka * debezium -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
