RussellSpitzer commented on issue #4930:
URL: https://github.com/apache/iceberg/issues/4930#issuecomment-1150127669
I believe the proper behavior should be that neither allows the old data to
be read back. Currently I wrote a repo
```scala
scala> spark.sql("CREATE external TABLE migratetest (foo int, bar int, zaz
int) USING PARQUET LOCATION '/Users/russellspitzer/Temp/migratetest'").show
scala> spark.sql("INSERT INTO migratetest (foo, bar , zaz) VALUES (1, 1, 1)")
res7: org.apache.spark.sql.DataFrame = []
scala> spark.sql("call
spark_catalog.system.migrate('spark_catalog.default.migratetest')")
res8: org.apache.spark.sql.DataFrame = [migrated_files_count: bigint]
scala> spark.sql("SELECT * FROM migratetest").show
+---+---+---+
|foo|bar|zaz|
+---+---+---+
| 1| 1| 1|
+---+---+---+
scala> spark.sql("ALTER TABLE migratetest DROP COLUMN foo")
res10: org.apache.spark.sql.DataFrame = []
scala> spark.sql("SELECT * FROM migratetest").show
+---+---+
|bar|zaz|
+---+---+
| 1| 1|
+---+---+
scala> spark.sql("ALTER TABLE migratetest ADD COLUMN foo int")
res12: org.apache.spark.sql.DataFrame = []
scala> spark.sql("SELECT * FROM migratetest").show
+---+---+---+
|bar|zaz|foo|
+---+---+---+
| 1| 1| 1|
```
The issue here is that the default name mapping is changed when the second
foo column is added, overriding the original name mapping.
Name Mapping in original table : Foo maps to 1
```json
[
{\n \"field-id\" : 1,\n \"names\" : [ \"foo\" ]\n},
{\n \"field-id\" : 2,\n \"names\" : [ \"bar\" ]\n},
{\n \"field-id\" : 3,\n \"names\" : [ \"zaz\" ]\n} ]```
```
Name Mapping after dropping "foo" : Foo still maps to 1
```json
[ {\n \"field-id\" : 1,\n \"names\" : [ \"foo\" ]\n},
{\n \"field-id\" : 2,\n \"names\" : [ \"bar\" ]\n},
{\n \"field-id\" : 3,\n \"names\" : [ \"zaz\" ]\n}
|]
Name Mapping after adding "foo" back : Foo now maps to 4 *This is incorrect
we should not be changing the existing mapping*
```json
[ {\n \"field-id\" : 1,\n \"names\" : [ ]\n},
{\n \"field-id\" : 2,\n \"names\" : [ \"bar\" ]\n},
{\n \"field-id\" : 3,\n \"names\" : [ \"zaz\" ]\n},
{\n \"field-id\" : 4,\n \"names\" : [ \"foo\" ]\n} ]
```
I'm on vacation now so i'm not going to look into this more, but IMHO that
final default name mapping should be identical to the one when dropping the
column. So the error here is in the "ADD COLUMN" code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]