[
https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksandar Tomic updated SPARK-46830:
-------------------------------------
Description:
This feature will introduce collation support to the Spark engine. This means
that:
# Every StringType will have an associated collation. Default remains UTF8
Binary, which will behave under the same rules as current UTF8 String
comparison.
# Collation will be respected in all collation sensitive operations -
comparisons, hashing, string operations (contains, startWith, endsWith etc.)
# Collation can be set through following ways:
## COLLATE expression. e.g. strExpr COLLATE collation_name
## In CREATE TABLE column definition
## By setting session collation.
# All the Spark operators need to respect collation settings (filters, joins,
shuffles, aggs etc.)
This is a high level description of the feature. You can find detailed design
under
[this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing]
link (doc is in attachment as well).
was:
This feature will introduce collation support to the Spark engine. This means
that:
# Every StringType will have an associated collation. Default remains UTF8
Binary, which will behave under the same rules as current UTF8 String
comparison.
# Collation will be respected in all collation sensitive operations -
comparisons, hashing, string operations (contains, startWith, endsWith etc.)
# Collation can be set through following ways:
## COLLATE expression. e.g. strExpr COLLATE collation_name
## In CREATE TABLE column definition
## By setting session collation.
# All the Spark operators need to respect collation settings (filters, joins,
shuffles, aggs etc.)
This is a high level description of the feature. You can find detailed design
under
[this|https://docs.google.com/document/d/1G3Xap-0Aj-QC6qoWZDDqO84IulHnogjD1REE3yh1_jk/edit?usp=sharing]
link (doc is in attachment as well).
> Introducing collation concept into Spark
> ----------------------------------------
>
> Key: SPARK-46830
> URL: https://issues.apache.org/jira/browse/SPARK-46830
> Project: Spark
> Issue Type: Epic
> Components: Spark Core
> Affects Versions: 4.0.0
> Reporter: Aleksandar Tomic
> Priority: Major
> Attachments: Collation Support in Spark.docx
>
>
> This feature will introduce collation support to the Spark engine. This means
> that:
>
> # Every StringType will have an associated collation. Default remains UTF8
> Binary, which will behave under the same rules as current UTF8 String
> comparison.
> # Collation will be respected in all collation sensitive operations -
> comparisons, hashing, string operations (contains, startWith, endsWith etc.)
> # Collation can be set through following ways:
> ## COLLATE expression. e.g. strExpr COLLATE collation_name
> ## In CREATE TABLE column definition
> ## By setting session collation.
> # All the Spark operators need to respect collation settings (filters,
> joins, shuffles, aggs etc.)
>
> This is a high level description of the feature. You can find detailed design
> under
> [this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing]
> link (doc is in attachment as well).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]