[
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301075#comment-15301075
]
ASF GitHub Bot commented on FLINK-3477:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/1517#discussion_r64668798
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/operators/ReduceCombineDriver.java
---
@@ -114,85 +118,133 @@ public void prepare() throws Exception {
MemoryManager memManager = this.taskContext.getMemoryManager();
final int numMemoryPages = memManager.computeNumberOfPages(
-
this.taskContext.getTaskConfig().getRelativeMemoryDriver());
+
this.taskContext.getTaskConfig().getRelativeMemoryDriver());
this.memory =
memManager.allocatePages(this.taskContext.getOwningNepheleTask(),
numMemoryPages);
- // instantiate a fix-length in-place sorter, if possible,
otherwise the out-of-place sorter
- if (this.comparator.supportsSerializationWithKeyNormalization()
&&
- this.serializer.getLength() > 0 &&
this.serializer.getLength() <= THRESHOLD_FOR_IN_PLACE_SORTING)
- {
- this.sorter = new
FixedLengthRecordSorter<T>(this.serializer, this.comparator, memory);
- } else {
- this.sorter = new
NormalizedKeySorter<T>(this.serializer, this.comparator.duplicate(), memory);
- }
-
ExecutionConfig executionConfig =
taskContext.getExecutionConfig();
this.objectReuseEnabled =
executionConfig.isObjectReuseEnabled();
if (LOG.isDebugEnabled()) {
LOG.debug("ReduceCombineDriver object reuse: " +
(this.objectReuseEnabled ? "ENABLED" : "DISABLED") + ".");
}
+
+ switch (strategy) {
+ case SORTED_PARTIAL_REDUCE:
+ // instantiate a fix-length in-place sorter, if
possible, otherwise the out-of-place sorter
+ if
(this.comparator.supportsSerializationWithKeyNormalization() &&
+ this.serializer.getLength() > 0 &&
this.serializer.getLength() <= THRESHOLD_FOR_IN_PLACE_SORTING) {
+ this.sorter = new
FixedLengthRecordSorter<T>(this.serializer, this.comparator, memory);
--- End diff --
Use a duplicated comparator.
> Add hash-based combine strategy for ReduceFunction
> --------------------------------------------------
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
> Issue Type: Sub-task
> Components: Local Runtime
> Reporter: Fabian Hueske
> Assignee: Gabor Gevay
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a
> single value. A Reduce function is incrementally applied to compute a final
> aggregated value. This allows to hold the preaggregated value in a hash-table
> and update it with each function call.
> The hash-based strategy requires special implementation of an in-memory hash
> table. The hash table should support in place updates of elements (if the
> updated value has the same size as the new value) but also appending updates
> with invalidation of the old value (if the binary length of the new value
> differs). The hash table needs to be able to evict and emit all elements if
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the
> execution strategy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)