Fabian Hueske created FLINK-3477:
------------------------------------
Summary: Add hash-based combine strategy for ReduceFunction
Key: FLINK-3477
URL: https://issues.apache.org/jira/browse/FLINK-3477
Project: Flink
Issue Type: Sub-task
Components: Local Runtime
Reporter: Fabian Hueske
This issue is about adding a hash-based combine strategy for ReduceFunctions.
The interface of the {{reduce()}} method is as follows:
{code}
public T reduce(T v1, T v2)
{code}
Input type and output type are identical and the function returns only a single
value. A Reduce function is incrementally applied to compute a final aggregated
value. This allows to hold the preaggregated value in a hash-table and update
it with each function call.
The hash-based strategy requires special implementation of an in-memory hash
table. The hash table should support in place updates of elements (if the
updated value has the same size as the new value) but also appending updates
with invalidation of the old value (if the binary length of the new value
differs). The hash table needs to be able to evict and emit all elements if it
runs out-of-memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)