[ https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154224#comment-15154224 ]
ASF GitHub Bot commented on FLINK-2237: --------------------------------------- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1517#issuecomment-186219363 OK, then lets keep the data in one partition for now. In case of var-length updates, this can default to a memory usage / combine behavior which is somewhat similar to the sort-based strategy: Filling the memory with records and emitting it (putting compaction aside). I'll review the PR once more will run a few end-to-end benchmarks as well. What kind of benchmarks have you done so far? - Did you check the combine rate (input / output ratio) compared to the sort-based strategy? - How much memory did you use for tests (upper bound)? Did you vary the memory? - Have you checked heap memory consumption / GC activity compared to the sort-based strategy? It might take a few more days before I actually get to this, but it is on my list. Thanks, Fabian > Add hash-based Aggregation > -------------------------- > > Key: FLINK-2237 > URL: https://issues.apache.org/jira/browse/FLINK-2237 > Project: Flink > Issue Type: New Feature > Reporter: Rafiullah Momand > Assignee: Gabor Gevay > Priority: Minor > > Aggregation functions at the moment are implemented in a sort-based way. > How can we implement hash based Aggregation for Flink? -- This message was sent by Atlassian JIRA (v6.3.4#6332)