Hi all,
I would like to start a discussion about *improving the type support
consistency of built-in aggregate functions in Flink SQL and aligning them
more systematically with the ANSI SQL standard*.
Background
Currently, Flink SQL provides a rich set of built-in aggregate functions
(e.g., SUM, AVG, MIN, MAX, COUNT, STDDEV, etc.). However, the supported
input types for these functions are not fully documented in a structured
way, and in some cases they appear to be inconsistent with ANSI SQL
expectations or with common database systems.
For example:
-
Some aggregate functions do not support certain string types such as CHAR
.
-
Numeric aggregates may have limitations or implicit behaviors around
DECIMAL precision/scale inference.
-
Support for INTERVAL, BOOLEAN, or time-related types is not always
clearly defined or consistent.
-
There is no centralized “type support matrix” describing which aggregate
function supports which logical types.
This makes it harder for users to reason about SQL portability and standard
compliance.
Proposal
I propose the following steps:
1.
Define a built-in aggregate function × data type support matrix.
-
Cover all built-in aggregate functions.
-
Cover major logical types (CHAR/VARCHAR, numeric types, DECIMAL,
DATE/TIME/TIMESTAMP, INTERVAL, BOOLEAN, etc.).
-
Explicitly document current support status.
2.
Compare the current behavior against:
-
ANSI SQL standard expectations
-
Widely adopted database behaviors (for reference)
3.
Identify gaps and inconsistencies, and prioritize incremental
improvements.
-
For example: enabling MIN/MAX on CHAR, clarifying DECIMAL inference
rules for AVG, etc.
-
Ensure backward compatibility and avoid breaking changes.
4.
Add corresponding validation tests and documentation updates to make the
behavior explicit and predictable.
Scope
This discussion is limited to:
-
Built-in aggregate functions in the Table/SQL planner.
-
Type inference, validation, and return type determination.
-
No changes to runtime semantics beyond enabling or clarifying type
support.
Compatibility
All changes should:
-
Preserve existing semantics where possible.
-
Avoid breaking existing queries.
-
Be introduced incrementally through small, reviewable improvements.
Questions for the community
1.
Do we agree that defining a formal type support matrix for built-in
aggregates would improve clarity and standard alignment?
2.
Are there known historical design decisions or constraints around
aggregate type support that we should consider?
3.
Would this effort require a FLIP, or can we proceed incrementally under
a series of improvement JIRAs?
If there is consensus, I can start by drafting an initial type support
matrix based on the current implementation and share it for review.
Looking forward to your feedback.
FLIP-XXX: Align Built-in Aggregate Function Type Support with ANSI SQL
<https://drive.google.com/open?id=1BWAU0ms6c5E1VkxplD9MwjOPL_V-ptexIu3CvWARa8g>
Best regards,
Feat Zhang