[jira] [Updated] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

Fabian Hueske (JIRA) Tue, 23 Feb 2016 02:18:56 -0800

     [ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fabian Hueske updated FLINK-3477:
---------------------------------
    Description: 
This issue is about adding a hash-based combine strategy for ReduceFunctions.
The interface of the {{reduce()}} method is as follows:

{code}
public T reduce(T v1, T v2)
{code}

Input type and output type are identical and the function returns only a single 
value. A Reduce function is incrementally applied to compute a final aggregated 
value. This allows to hold the preaggregated value in a hash-table and update 
it with each function call. 

The hash-based strategy requires special implementation of an in-memory hash 
table. The hash table should support in place updates of elements (if the 
updated value has the same size as the new value) but also appending updates 
with invalidation of the old value (if the binary length of the new value 
differs). The hash table needs to be able to evict and emit all elements if it 
runs out-of-memory.

We should also add {{HASH}} and {{SORT}} compiler hints to {{DataSet.reduce()}} 
and {{Grouping.reduce()}} to allow users to pick the execution strategy.

  was:
This issue is about adding a hash-based combine strategy for ReduceFunctions.
The interface of the {{reduce()}} method is as follows:

{code}
public T reduce(T v1, T v2)
{code}

Input type and output type are identical and the function returns only a single 
value. A Reduce function is incrementally applied to compute a final aggregated 
value. This allows to hold the preaggregated value in a hash-table and update 
it with each function call. 

The hash-based strategy requires special implementation of an in-memory hash 
table. The hash table should support in place updates of elements (if the 
updated value has the same size as the new value) but also appending updates 
with invalidation of the old value (if the binary length of the new value 
differs). The hash table needs to be able to evict and emit all elements if it 
runs out-of-memory.


> Add hash-based combine strategy for ReduceFunction
> --------------------------------------------------
>
>                 Key: FLINK-3477
>                 URL: https://issues.apache.org/jira/browse/FLINK-3477
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Fabian Hueske
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

Reply via email to