[
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840728#comment-17840728
]
Uroš Bojanić edited comment on SPARK-47353 at 4/28/24 4:06 PM:
---------------------------------------------------------------
[~panbingkun] if you're looking to make some contributions to the collation
effort, please check out this ticket and let me know if you want to claim it!
(edit: claimed by [~gpgp])
was (Author: JIRAUSER304339):
[~panbingkun] if you're looking to make some contributions to the collation
effort, please check out this ticket and let me know if you want to claim it!
> Mode (all collations)
> ---------------------
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Uroš Bojanić
> Priority: Major
>
> Enable collation support for the *Mode* expression in Spark. First confirm
> what is the expected behaviour for this expression when given collated
> strings, then move on to the implementation that would enable handling
> strings of all collation types. Implement the corresponding unit tests and
> E2E SQL tests to reflect how this function should be used with collation in
> SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment
> with the existing functions to learn more about how they work. In addition,
> look into the possible use-cases and implementation of similar functions
> within other other open-source DBMS, such as
> [PostgreSQL|https://www.postgresql.org/docs/].
>
> The goal for this Jira ticket is to implement the *Mode* expression so it
> supports all collation types currently supported in Spark. To understand what
> changes were introduced in order to enable full collation support for other
> existing functions in Spark, take a look at the Spark PRs and Jira tickets
> for completed tasks in this parent (for example: Contains, StartsWith,
> EndsWith).
> Examples:
> With UTF8_BINARY collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’)
> AS tab(col);
> should return 'a'.
> With UTF8_BINARY_LCASE collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’)
> AS tab(col);
> should return either 'B' or 'b'.
>
> Read more about ICU [Collation Concepts|http://example.com/] and
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical
> Standard for
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]