Aleksandar Tomic created SPARK-46830:
----------------------------------------
Summary: Introducing collation concept into Spark
Key: SPARK-46830
URL: https://issues.apache.org/jira/browse/SPARK-46830
Project: Spark
Issue Type: Epic
Components: Spark Core
Affects Versions: 4.0.0
Reporter: Aleksandar Tomic
This feature will introduce collation support to the Spark engine. This means
that:
# Every StringType will have an associated collation. Default remains UTF8
Binary, which will behave under the same rules as current UTF8 String
comparison.
# Collation will be respected in all collation sensitive operations -
comparisons, hashing, string operations (contains, startWith, endsWith etc.)
# Collation can be set through following ways:
## COLLATE expression. e.g. strExpr COLLATE collation_name
## In CREATE TABLE column definition
## By setting session collation.
# All the Spark operators need to respect collation settings (filters, joins,
shuffles, aggs etc.)
This is a high level description of the feature. You can find detailed design
under
[this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing]
link.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]