[
https://issues.apache.org/jira/browse/FLINK-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296731#comment-15296731
]
ASF GitHub Bot commented on FLINK-3586:
---------------------------------------
GitHub user fhueske opened a pull request:
https://github.com/apache/flink/pull/2024
[FLINK-3586] Fix potential overflow of Long AVG aggregation.
Fixes a potential overflow of Long `AVG` aggregates in the Table API
(intermediate sum is computed using `BigInteger` instead of `Long`).
Aggregates are refactored to specify their intermediate types as
`TypeInformation` instead of SQL types. Intermediate results are not exposed to
Calcite and Flink internal. So SQL types are not required and need to be
converted into `TypeInformation` in any case.
Adds unit tests for `MIN`, `MAX´, `COUNT`, `SUM`, and `AVG` aggregates.
- [X] General
- [X] Documentation
- No functionality added
- Some ScalaDocs extended
- [X] Tests & Build
- Unit tests for existing Aggregates added
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fhueske/flink tableLongAvgOverflow
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2024.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2024
----
commit a887d1d7edb2b1b96652ca5021beec123011e03a
Author: Fabian Hueske <[email protected]>
Date: 2016-05-22T14:46:43Z
[FLINK-3586] Fix potential overflow of Long AVG aggregation.
- Add unit tests for Aggretates.
----
> Risk of data overflow while use sum/count to calculate AVG value
> ----------------------------------------------------------------
>
> Key: FLINK-3586
> URL: https://issues.apache.org/jira/browse/FLINK-3586
> Project: Flink
> Issue Type: Sub-task
> Components: Table API
> Reporter: Chengxiang Li
> Assignee: Fabian Hueske
> Priority: Minor
>
> Now, we use {{(sum: Long, count: Long}} to store AVG partial aggregate data,
> which may have data overflow risk, we should use unbounded data type(such as
> BigInteger) to store them for necessary data types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)