Hi everyone,

this is my first interaction with the Iceberg community, so here a few words about myself:
- I'm Alex, a Berlin-based software engineer
- I've been working at Snowflake for 4 years now
- I spend most of my time on data types, particularly binary, strings and collations.

I'd like to start a discussion about adding collations to the Iceberg spec.

Conceptually, collations are an annotation on the string data type. By default, most engines perform string operations case-sensitively. Collations allow specifying alternative comparison rules. This is useful for achieving, e.g., case- or accent-insensitive string operations, or language-specific string sorting. Collations are supported by many engines: Databricks <https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-collation>, Spark <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.collate.html>, Snowflake <https://docs.snowflake.com/en/sql-reference/collation>, Oracle <https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/COLLATION.html> - to name just a few - this list is not complete.

In Snowflake, we see heavy use of the collation feature. Several users have approached us, mentioning they want to migrate to Iceberg tables, but are currently blocked by Iceberg's lack of collation support.

Given the widespread support for collations across different engines, I believe introducing collations to Iceberg will increase interoperability and boost its adoption.
I'd be curious about your thoughts.

*Goal of the proposal*
- Support collation specifications for columns
- Define how collation bounds should be stored - UTF-8 based bounds are not useful for collated columns

*Required Changes*
- Extend the schema to let (string) fields be annotated with a collation

More details can be found in this doc <https://docs.google.com/document/d/1m8b7u97uteHYjXk-4DNglJSpQO8OcZOCzW2tApCNTW4/edit?tab=t.0#heading=h.y1ant4w2163k>.

I'm also hoping to present the idea in the next community sync.

Best, Alex

Reply via email to