[
https://issues.apache.org/jira/browse/FLINK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854402#comment-15854402
]
Fabian Hueske edited comment on FLINK-5564 at 2/6/17 5:19 PM:
--------------------------------------------------------------
Hi [~Shaoxuan Wang], I think it should be possible to split the first three
steps as follows:
1. Add the new UDAGG interface and migrate existing aggregation functions to it.
We keep the current implementation and just add code and unit tests for the
implementations of the aggregation functions that implement the new interface.
2. Use the new aggregation function for batch tables.
We switch the implementation for the DataSet runtime code. We still keep the
old aggregation functions for the streaming code.
3. Use the new aggregation function for streaming tables.
We switch the implementation for the DataStream runtime code. In this step we
remove the old aggregation functions and clean up.
Adressing 1, 2, and 3 in a single issue will result in a huge PR which will be
hard to review. I'd prefer several smaller steps with well defined scope.
Thanks, Fabian
was (Author: fhueske):
Hi [~Shaoxuan Wang], I think it should be possible to split the first three
steps as follows:
1. Add the new UDAGG interface and migrate existing aggregation functions to it.
We keep the current implementation and just add code and unit tests for the
implementations of the aggregation functions that implement the new interface.
2. Use the new aggregation function for batch tables.
We switch the implementation for the DataSet runtime code. We still keep the
old aggregation functions for the streaming code.
3. Use the new aggregation function for streaming tables.
We switch the implementation for the DataStream runtime code. In this step we
remove the old aggregation functions and clean up.
Adressing 1, 2, and 3 in a single issue will result in a huge PR which will be
hard to review. I'd prefer several smaller steps with well defined scope.
Regarding the discussion of the window OVER functions. It would be great if you
could post your comment (with which I agree) to the corresponding JIRA issue
and the discussion on the dev list otherwise it might not be noticed.
Thanks, Fabian
> User Defined Aggregates
> -----------------------
>
> Key: FLINK-5564
> URL: https://issues.apache.org/jira/browse/FLINK-5564
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Reporter: Shaoxuan Wang
> Assignee: Shaoxuan Wang
>
> User-defined aggregates would be a great addition to the Table API / SQL.
> The current aggregate interface is not well suited for the external users.
> This issue proposes to redesign the aggregate such that we can expose an
> better external UDAGG interface to the users. The detailed design proposal
> can be found here:
> https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit
> Motivation:
> 1. The current aggregate interface is not very concise to the users. One
> needs to know the design details of the intermediate Row buffer before
> implements an Aggregate. Seven functions are needed even for a simple Count
> aggregate.
> 2. Another limitation of current aggregate function is that it can only be
> applied on one single column. There are many scenarios which require the
> aggregate function taking multiple columns as the inputs.
> 3. “Retraction” is not considered and covered in the current Aggregate.
> 4. It might be very good to have a local/global aggregate query plan
> optimization, which is very promising to optimize UDAGG performance in some
> scenarios.
> Proposed Changes:
> 1. Implement an aggregate dataStream API (Done by
> [FLINK-5582|https://issues.apache.org/jira/browse/FLINK-5582])
> 2. Update all the existing aggregates to use the new aggregate dataStream API
> 3. Provide a better User-Defined Aggregate interface
> 4. Add retraction support
> 5. Add local/global aggregate
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)