linliu-code opened a new issue, #14081:
URL: https://github.com/apache/hudi/issues/14081
### Bug Description
**What happened:**
When we enable catalog (external ones or local spark catalog), and create a
schema in the catalog either from Spark SQL queries, or through MetaSync. Later
when we use Spark DS to create a new table with the same name, and Hudi cannot
find table schemas from storage, then it would contact the catalog for table
schema. This may cause some issues like,
1. User A creates a table with `table_name` in the catalog either using
`Spark SQL` or `spark.sql()`;
2. User B enables the catalog using Spark datasource, and insert/bulk_insert
a different table with the same name `table_name`, but different schema, then
the queries from B could would fail since the table in catalog and table
created by B are not compatible for schema evolution.
**What you expected:**
the queries from B should not access catalog for the table from a different
session.
**Steps to reproduce:**
```
~/spark/spark-3.3.4-bin-hadoop3/bin/spark-shell --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.15.0 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
import java.util.UUID
val tableName = UUID.randomUUID().toString.replaceAll("-","_")
val basePath = "file:///tmp/trips_table" + UUID.randomUUID().toString
spark.sql(s"""CREATE TABLE ${tableName} (
ts BIGINT,
uuid STRING,
rider STRING,
driver STRING,
fare DOUBLE,
city STRING
) USING HUDI
PARTITIONED BY (city);""")
val columns = Seq("ts1","uuid","rider","driver","fare","city")
val data =
Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
(1695115999911L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));
var inserts = spark.createDataFrame(data).toDF(columns:_*)
inserts.write.format("hudi").
option("hoodie.datasource.write.partitionpath.field", "city").
option("hoodie.table.name", tableName).
mode("overwrite").
save(basePath)
```
### Environment
**Hudi version:**
**Query engine:** (Spark/Flink/Trino etc)
**Relevant configs:**
### Logs and Stack Trace
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]