wypoon opened a new pull request, #7469: URL: https://github.com/apache/iceberg/pull/7469
... and make the SparkCatalog use a case-insensitive CachingCatalog by default. Motivation: The `CachingCatalog` has a field that determines if the cache uses case-sensitive `TableIdentifier` keys. By default, caching is enabled and the `SparkCatalog` wraps itself in a case-sensitive `CachingCatalog`. There is no configuration to allow specifying the use of a case-insensitive `CachingCatalog`. In a customer SQL workload, we discovered that queries used inconsistent case for database and table names. A table is read from using an upper case name and is updated using a lower case name. This is not incorrect as SQL is case-insensitive for database, table and column names. This is in the **same** Spark session. Normally the new snapshot should be read immediately after the write, but it is not, due to a **different** table being read from the cache (two different entries for the table are in the case, under different keys). As a result, stale data is read until the cache expiration occurs. (Due to repeated reads, the cache keeps getting renewed, exacerbating the problem.) Note: Currently, in the `CachingCatalog`, the metadata table resolution is already case-insensitive for the metadata part of the name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org