[GitHub] [iceberg] vshinde-medacist opened a new issue, #4542: Schema Evolution exception: too many data columns

GitBox Mon, 11 Apr 2022 23:43:44 -0700


vshinde-medacist opened a new issue, #4542:
URL: https://github.com/apache/iceberg/issues/4542


   We are trying to evaluate schema evolution feature supported by  Iceberg and 
below are the steps carried out so far.
   
   1. Create a new Iceberg table
   
   `people.csv` data:
   
   | age|   name|
   |----|-------|
   |  30|   Andy|
   |  19| Justin|
   
   ```scala
   val df : DataFrame = spark.read.format("csv").option("header", 
"true").option("delimiter", ",").load("/spark-apps/people.csv")
   df.write.format("iceberg").saveAsTable("local.demo_table")
   ```
   
   2. Add a new column and append it to the table created in Step 1.
   `updated_people.csv` data:
   
   | age|   name|job|
   |----|-------| ---- |
   |  36|   Vikram| Developer|
   |  18| Raj| Developer|
   
   ```scala
   val csvdf = spark.read.format("csv").option("header", 
"true").option("delimiter", ",").load("/spark-apps/updated_people.csv")
   csvdf.write.format("iceberg").mode("append").save("/path/to/table)
   ```
   But getting the below exception:
   
   ```cmd
   org.apache.spark.sql.AnalysisException: Cannot write to '/path/to/table', 
too many data columns:
   Table columns: 'age', 'name'
   Data columns: 'age', 'name', 'job'
   ```
   
   Any suggestions on how to enable schema evolution support on dataframe.
   
   FYR, entire POC script:
   
   ```scala
       val df : DataFrame = spark.read.format("csv").option("header", 
"true").option("delimiter", ",").load("/spark-apps/people.csv")
       df.show()    
   
       df.write.format("iceberg").saveAsTable("local.demo_table")
   
       val tableData = 
spark.read.format("iceberg").load("s3a://iceberg-poc/warehouse/demo_table")
       tableData.show()
   
       val csvdf = spark.read.format("csv").option("header", 
"true").option("delimiter", ",").load("/spark-apps/updated_people.csv")
       csvdf.show()
      
      // Exception while executing below statement
       
csvdf.write.format("iceberg").mode("append").save("s3a://iceberg-poc/warehouse/demo_table")
 
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] vshinde-medacist opened a new issue, #4542: Schema Evolution exception: too many data columns

Reply via email to