[ https://issues.apache.org/jira/browse/SPARK-46536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-46536. ---------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44538 [https://github.com/apache/spark/pull/44538] > Support GROUP BY calendar_interval_type > --------------------------------------- > > Key: SPARK-46536 > URL: https://issues.apache.org/jira/browse/SPARK-46536 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0 > Reporter: Wenchen Fan > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark GROUP BY only allows orderable data types, otherwise the > plan analysis fails: > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala#L197-L203] > However, this is too strict as GROUP BY only cares about equality, not > ordering. The CalendarInterval type is not orderable (1 month and 30 days, we > don't know which one is larger), but has well-defined equality. In fact, we > already support `SELECT DISTINCT calendar_interval_type` in some cases (when > hash aggregate is picked by the planner). > The proposal here is to officially support calendar interval type in GROUP > BY. We should relax the check inside `CheckAnalysis`, and make > `CalendarInterval` implements `Comparable` using natural ordering (compare > months first, then days, then seconds), and test with both hash aggregate and > sort aggregate. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org