CodingCat opened a new pull request, #5083:
URL: https://github.com/apache/iceberg/pull/5083
when upgrading to Spark 3.2, I found an issue that we cannot use
update/delete/merge into with tables we created with `HadoopTables.create()`,
this issue doesn't exist in 3.0 - 3.1.
Previously, we use Spark to read and merge/delete/update hadoop tables with
the following approach
```scala
val hadoopTableDF = spark.read.format("iceberg").load(path)
hadoopTableDF.createOrReplaceTempView("target")
newDF.createOrReplaceTempView("source")
spark.sql("MERGE INTO target using source on target.id = source.id WHEN
MATCHED THEN... WHEN NOT MATCHED THEN ...")
```
this doesn't work because the analyzer rules in iceberg with Spark 3.2 only
recognize tables already registered in catalog
While I am aware of that catalog usage is recommended in production, I think
this compatibility is still needed
* any breaking change from Spark 3.0/3.1 to Spark 3.2 is not desired anyway
* strategically, I think many users in Delta Lake has been used to "path
tables" (like ourselves), this incompatibility is a barrier for us to
completely move over
* using pure path table is a lightweight approach for experiments, we do not
need to clean those temp tables names in catalog but only creating tables in
some s3 bucket with TTL
* personally, we do have many tables already created as hadoop table ,
upgrading to Spark 3.2 move everything to catalog is an unnecessary part when
we upgrade to Spark 3.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]