Max Gekk created SPARK-57557:
--------------------------------
Summary: Support the TIME data type in quantile and sketch
aggregates
Key: SPARK-57557
URL: https://issues.apache.org/jira/browse/SPARK-57557
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
Allow {{TimeType}} as input to the quantile/sketch aggregate functions:
{{percentile}}, {{percentile_approx}} / {{approx_percentile}}
({{ApproximatePercentile}}),
{{median}}, {{histogram_numeric}} ({{HistogramNumeric}}), and the datasketches
aggregates.
h2. Why
These functions currently accept NumericType, DateType, TimestampType,
TimestampNTZType and
intervals, but not {{TimeType}} (see {{ApproximatePercentile.inputTypes}}).
TIME is an
ordered datetime type with a {{Long}} internal value, so percentiles/medians
are well
defined and consistent with how TIMESTAMP is already handled.
h2. Scope
* Add {{TimeType}} to the {{inputTypes}}/{{TypeCollection}} of the affected
aggregates.
* Add the {{TimeType}} branches in the value<->double conversions (the internal
value is a
{{Long}}, same as TIMESTAMP/DayTimeInterval).
* Datasketches aggregates: include {{TimeType}} in the supported-types list
(note the existing "implement support for decimal/datetime/interval types"
TODO).
* Return type for a TIME percentile/median is {{TimeType}} (matching the
TIMESTAMP behavior).
h2. Acceptance criteria
* {{percentile}}, {{percentile_approx}}, {{median}}, {{histogram_numeric}} work
on TIME
columns and return TIME.
* Tests added alongside the existing datetime aggregate tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]