Max Gekk created SPARK-57557:
--------------------------------

             Summary: Support the TIME data type in quantile and sketch 
aggregates
                 Key: SPARK-57557
                 URL: https://issues.apache.org/jira/browse/SPARK-57557
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. What

Allow {{TimeType}} as input to the quantile/sketch aggregate functions:
{{percentile}}, {{percentile_approx}} / {{approx_percentile}} 
({{ApproximatePercentile}}),
{{median}}, {{histogram_numeric}} ({{HistogramNumeric}}), and the datasketches 
aggregates.

h2. Why

These functions currently accept NumericType, DateType, TimestampType, 
TimestampNTZType and
intervals, but not {{TimeType}} (see {{ApproximatePercentile.inputTypes}}). 
TIME is an
ordered datetime type with a {{Long}} internal value, so percentiles/medians 
are well
defined and consistent with how TIMESTAMP is already handled.

h2. Scope

* Add {{TimeType}} to the {{inputTypes}}/{{TypeCollection}} of the affected 
aggregates.
* Add the {{TimeType}} branches in the value<->double conversions (the internal 
value is a
  {{Long}}, same as TIMESTAMP/DayTimeInterval).
* Datasketches aggregates: include {{TimeType}} in the supported-types list
  (note the existing "implement support for decimal/datetime/interval types" 
TODO).
* Return type for a TIME percentile/median is {{TimeType}} (matching the 
TIMESTAMP behavior).

h2. Acceptance criteria

* {{percentile}}, {{percentile_approx}}, {{median}}, {{histogram_numeric}} work 
on TIME
  columns and return TIME.
* Tests added alongside the existing datetime aggregate tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to