[GitHub] [iceberg] wypoon opened a new pull request, #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

via GitHub Fri, 28 Apr 2023 17:12:57 -0700


wypoon opened a new pull request, #7469:
URL: https://github.com/apache/iceberg/pull/7469


   ... and make the SparkCatalog use a case-insensitive CachingCatalog by 
default.
   
   Motivation:
   The `CachingCatalog` has a field that determines if the cache uses 
case-sensitive `TableIdentifier` keys. By default, caching is enabled and the 
`SparkCatalog` wraps itself in a case-sensitive `CachingCatalog`. There is no 
configuration to allow specifying the use of a case-insensitive 
`CachingCatalog`.
   In a customer SQL workload, we discovered that queries used inconsistent 
case for database and table names. A table is read from using an upper case 
name and is updated using a lower case name. This is not incorrect as SQL is 
case-insensitive for database, table and column names. This is in the **same** 
Spark session. Normally the new snapshot should be read immediately after the 
write, but it is not, due to a **different** table being read from the cache 
(two different entries for the table are in the case, under different keys). As 
a result, stale data is read until the cache expiration occurs. (Due to 
repeated reads, the cache keeps getting renewed, exacerbating the problem.)
   
   Note: Currently, in the `CachingCatalog`, the metadata table resolution is 
already case-insensitive for the metadata part of the name.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] wypoon opened a new pull request, #7469: Core, Spark: Add configuration to control case sensitivity of CachingCatalog

Reply via email to