itaise opened a new issue, #4510:
URL: https://github.com/apache/iceberg/issues/4510
Hi,
we are writing to iceberg using spark, and when renaming the partition field
name, we are getting a validation error:
```
org.apache.iceberg.exceptions.ValidationException: Cannot find source column
for partition field: 1000: some_date: void(1)
```
It seems like iceberg is referring to the existing table partition field
name, which is irrelevant anymore - as there is a new partition field, and the
write mode is "overwrite".
Can you please assist?
Thank you!
Here is a minimal reproducible example:
1. create the original table with partition field "some_date":
```
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType ,StructField, StringType
dataDF = [('1991-04-01',)]
schema = StructType([
StructField('some_date',StringType(), True)])
spark = SparkSession.builder.master('local[1]').appName('example') \
.getOrCreate()
df = spark.createDataFrame(data = dataDF, schema = schema)
spark.sql(f"use iprod") # catalog
spark.sql(f"CREATE SCHEMA IF NOT EXISTS iprod.test_schema")
df.write.mode("overwrite").format("parquet").partitionBy('some_date').saveAsTable("iprod.test_schema.example")
```
2. Try to overwrite the table with the same code, but the partition field
renamed to ```some_date_2```
```
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType ,StructField, StringType
dataDF = [('1991-04-01',)]
schema = StructType([
StructField('some_date_2',StringType(), True)])
spark = SparkSession.builder.master('local[1]').appName('example') \
.getOrCreate()
df = spark.createDataFrame(data = dataDF, schema = schema)
spark.sql(f"use iprod") # catalog
spark.sql(f"CREATE SCHEMA IF NOT EXISTS iprod.test_schema")
df.write.mode("overwrite").format("parquet").partitionBy('some_date_2').saveAsTable("iprod.test_schema.example")
```
Full trace:
```
: org.apache.iceberg.exceptions.ValidationException: Cannot find source
column for partition field: 1000: some_date: void(1)
at
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:46)
at
org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:511)
at
org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:503)
at
org.apache.iceberg.TableMetadata.reassignPartitionIds(TableMetadata.java:768)
at
org.apache.iceberg.TableMetadata.buildReplacement(TableMetadata.java:790)
at
org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.newReplaceTableTransaction(BaseMetastoreCatalog.java:256)
at
org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.createOrReplaceTransaction(BaseMetastoreCatalog.java:244)
at
org.apache.iceberg.CachingCatalog$CachingTableBuilder.createOrReplaceTransaction(CachingCatalog.java:244)
at
org.apache.iceberg.spark.SparkCatalog.stageCreateOrReplace(SparkCatalog.java:190)
at
org.apache.spark.sql.execution.datasources.v2.AtomicReplaceTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:197)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:686)
at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:619)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]