Ryan,

I also replied in the PR. In my testing I do not see any runtime failure when 
trying to use a case-sensitive schema in a case-insensitive way.

-Steve Lessard, Teradata

From: Ryan Blue <b...@databricks.com.INVALID>
Date: Wednesday, July 31, 2024 at 5:02 PM
To: dev@iceberg.apache.org <dev@iceberg.apache.org>
Subject: [EXTERNAL] Re: Case-insensitive schemas
[CAUTION: External Email]

Steve,

I replied on the PR, but the gist is that you're right. Using a schema that has 
fields that would be considered identical in a case insensitive context will 
fail at runtime. That's the right behavior because Iceberg can't control the 
case sensitivity of applications or engines.

Ryan

On Wed, Jul 31, 2024 at 4:17 PM Lessard, Steve 
<steve.less...@teradata.com.invalid> wrote:
Is there some kind of configuration or metadata flag that hints whether a 
Schema is intended to be used case-sensitive or case-insensitive?

In my PR for adding case-insensitivity support to PartitionSpec Steven Wu 
asked<https://github.com/apache/iceberg/pull/10678#discussion_r1696122748>:

caseInsensitiveFindField uses normalized lower-case string for name -> id 
indexing. who can ensure the schema don't have two fields with names like data 
and DATA? Otherwise, caseInsensitiveFindField search is ambiguous. I am 
wondering if the caseSensitive config need to be pushed into the Schema and 
spec?

I suppose the answer to this question is to NOT use a case-sensitive schema in 
a case-insensitive way. In other words, if the schema is case-sensitive and 
contains columns named Make and MAKE then the result of calling 
caseInsensitiveFindField("Make") is undefined. But that raises a new question: 
how does one know the Schema was created with the intention of being used 
case-sensitive or case-insensitive? I looked at 
https://iceberg.apache.org/docs/latest/configuration<https://iceberg.apache.org/docs/latest/configuration/#catalog-properties>
 but found nothing.

Is there some kind of configuration or metadata flag that gives a hint?

-Steve Lessard, Teradata




--
Ryan Blue
Databricks

Reply via email to