[ 
https://issues.apache.org/jira/browse/SPARK-24469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503687#comment-16503687
 ] 

Eric Maynard commented on SPARK-24469:
--------------------------------------

Ah, I see, I was wrongly thinking of the second case where you use e.g. MIN to 
get some legitimate input value. But I can see how *min* would yield bad 
performance. 
Maybe try *first* instead?

> Support collations in Spark SQL
> -------------------------------
>
>                 Key: SPARK-24469
>                 URL: https://issues.apache.org/jira/browse/SPARK-24469
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Alexander Shkapsky
>            Priority: Major
>
> One of our use cases is to support case-insensitive comparison in operations, 
> including aggregation and text comparison filters.  Another use case is to 
> sort via collator.  Support for collations throughout the query processor 
> appear to be the proper way to support these needs.
> Language-based worked arounds (for the aggregation case) are insufficient:
>  # SELECT UPPER(text)....GROUP BY UPPER(text)
> introduces invalid values into the output set
>  # SELECT MIN(text)...GROUP BY UPPER(text) 
> results in poor performance in our case, in part due to use of sort-based 
> aggregate
> Examples of collation support in RDBMS:
>  * [PostgreSQL|https://www.postgresql.org/docs/10/static/collation.html]
>  * [MySQL|https://dev.mysql.com/doc/refman/8.0/en/charset.html]
>  * 
> [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/linguistic-sorting-and-matching.html]
>  * [SQL 
> Server|https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017]
>  * 
> [DB2|https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.nls.doc/com.ibm.db2.luw.admin.nls.doc-gentopic2.html]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to