[ 
https://issues.apache.org/jira/browse/FLINK-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947800#comment-15947800
 ] 

ASF GitHub Bot commented on FLINK-6216:
---------------------------------------

GitHub user shaoxuan-wang opened a pull request:

    https://github.com/apache/flink/pull/3646

    [FLINK-6216] [table] DataStream unbounded groupby aggregate with early 
firing

    1. Implemented an unbounded groupby aggregate with early firing (period is 
1, emit per every record)
    2. Refactored the DataStreamAggregate to  DataStreamGroupWindowAggregate
    
    Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful 
description of your changes.
    
    - [x] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the 
JIRA id)
    
    - [ ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [x] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shaoxuan-wang/flink F6216-submit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3646
    
----
commit 1c634b9f72f23a1e1d9ce973391e8cb9232a1950
Author: shaoxuan-wang <[email protected]>
Date:   2017-03-29T19:57:58Z

    [FLINK-6216] [table] DataStream unbounded groupby aggregate with early 
firing

----


> DataStream unbounded groupby aggregate with early firing
> --------------------------------------------------------
>
>                 Key: FLINK-6216
>                 URL: https://issues.apache.org/jira/browse/FLINK-6216
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Shaoxuan Wang
>            Assignee: Shaoxuan Wang
>
> Groupby aggregate results in a replace table. For infinite groupby aggregate, 
> we need a mechanism to define when the data should be emitted (early-fired). 
> This task is aimed to implement the initial version of unbounded groupby 
> aggregate, where we update and emit aggregate value per each arrived record. 
> In the future, we will implement the mechanism and interface to let user 
> define the frequency/period of early-firing the unbounded groupby aggregation 
> results.
> The limit space of backend state is one of major obstacles for supporting 
> unbounded groupby aggregate in practical. Due to this reason, we suggest two 
> common (and very useful) use-cases of this unbounded groupby aggregate:
> 1. The range of grouping key is limit. In this case, a new arrival record 
> will either insert to state as new record or replace the existing record in 
> the backend state. The data in the backend state will not be evicted if the 
> resource is properly provisioned by the user, such that we can provision the 
> correctness on aggregation results.
> 2. When the grouping key is unlimited, we will not be able ensure the 100% 
> correctness of "unbounded groupby aggregate". In this case, we will reply on 
> the TTL mechanism of the RocksDB backend state to evicted old data such that 
> we can provision the correct results in a certain time range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to