neerajpadarthi commented on issue #5519:
URL: https://github.com/apache/hudi/issues/5519#issuecomment-1125632878

   @xiarixiaoyao Thanks for checking. 
   
   I have checked below scenario with alter statement using 0.9V but ended up 
with errors. Added my observations for your reference.  
   
   1. Created table with bulk_insert parallelism - 5 (using schema colx Int)
   > (S3 - It created 5 file groups and 1 commit file) 
   > (Glue - It created a table with colx having Int datatype)
   > (Accessed Via Spark DF - Was able to query and colx is having integer 
datatype)
   > (Accessed Via Spark Sql - Was able to query and colx is having integer 
datatype)
   
   2. Performed upsert delta with col-x schema Long 
   > (on S3 - It recreated impacted file groups(3 file groups) and added 1 
commit file) 
   > (on Glue - New schema version got created and schema for colx got updated 
to bigint datatype)
   > (Accessed Via Spark DF - Query failed and colx is having long datatype) - 
Same as above error 
   > (Accessed Via Spark Sql - Was able to query old records(int) but failed 
when querying upserted records(long) and colx is having Integer datatype)  - 
Failed with org.apache.parquet.column.Dictionary.decodeToInt
   
   3. Altered the table to Long datatype using spark.sql 
   > ( on S3 - It created 1 commit file) 
   > (on Glue - New schema version got created but no change in schema 
attributes)
   > (Accessed Via Spark DF - Query failed and colx is having long datatype) - 
Same as above error 
   > (Accessed Via Spark Sql - Was able to query upserted records(long) but 
failed when querying old records(Int) and colx is updated to long datatype)  - 
Failed with org.apache.parquet.column.Dictionary.decodeToLong
   
   Summary -  Observed schema inconsistencies with spark.sql and spark DF 
operation. No change with Alter statement. When reading portions of the table 
its succeeding but it fails when complete table is read. I think as all file 
groups didn't update so its failing.   
   
   Q. I am currently using 0.9V. Is this an issue with this version? Do I need 
to migrate to 0.11V to validate schema promotion? 
   
   Please let me know if I am missing something. Thanks in Advance. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to