[ 
https://issues.apache.org/jira/browse/FLINK-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-9673:
---------------------------------
    Description: 
Currently, the implementations of bounded OVER window aggregations store the 
complete input for the bound interval. For example for the query:
{code:java}
SELECT user_id, count(action) OVER (PARTITION BY user_id ORDER BY rowtime RANGE 
INTERVAL '14' DAY PRECEDING) action_count, rowtime
FROM 
    SELECT rowtime, user_id, action, val1, val2, val3, val4 FROM user
{code}
The whole records with schema {{(rowtime, user_id, action, val1, val2, val3, 
val4)}} are stored for 14 days in order to retract them after 14 days from the 
accumulators.

However, it would be sufficient to only store those fields that are required 
for the aggregtions, i.e., {{action}} in the example above. All other fields 
could be set to {{null}} and hence significantly reduce the amount of data that 
needs to be stored in state.

This improvement can be applied to all four combinations of bounded 
[rowtime|proctime] [range|rows] OVER windows.

  was:
Currently, the implementations of bounded OVER window aggregations store the 
complete input for the bound interval. For example for the query:

{code}
SELECT user_id, count(action) OVER (PARTITION BY user_id ORDER BY rowtime RANGE 
INTERVAL '14' DAY PRECEDING) action_count, rowtime
FROM 
    SELECT rowtime, user_id, action, val1, val2, val3, val4 FROM user
{code}

The whole records with schema {{(rowtime, user_id, action, val1, val2, val3, 
val4)}} are stored for 14 days in order to retract them after 14 days from the 
accumulators.

However, it would be sufficient to only store those fields that are required 
for the aggregtions, i.e., {{action}} in the example above. All other fields 
could be set to {{null}} and hence significantly reduce the amount of data that 
needs to be stored in state.




> Improve State efficiency of bounded OVER window operators
> ---------------------------------------------------------
>
>                 Key: FLINK-9673
>                 URL: https://issues.apache.org/jira/browse/FLINK-9673
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Priority: Major
>
> Currently, the implementations of bounded OVER window aggregations store the 
> complete input for the bound interval. For example for the query:
> {code:java}
> SELECT user_id, count(action) OVER (PARTITION BY user_id ORDER BY rowtime 
> RANGE INTERVAL '14' DAY PRECEDING) action_count, rowtime
> FROM 
>     SELECT rowtime, user_id, action, val1, val2, val3, val4 FROM user
> {code}
> The whole records with schema {{(rowtime, user_id, action, val1, val2, val3, 
> val4)}} are stored for 14 days in order to retract them after 14 days from 
> the accumulators.
> However, it would be sufficient to only store those fields that are required 
> for the aggregtions, i.e., {{action}} in the example above. All other fields 
> could be set to {{null}} and hence significantly reduce the amount of data 
> that needs to be stored in state.
> This improvement can be applied to all four combinations of bounded 
> [rowtime|proctime] [range|rows] OVER windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to