Github user greghogan commented on the issue:
https://github.com/apache/flink/pull/4372
@StephanEwen I like the new template. I much prefer free form over
checkboxes.
@fhueske I'm questioning my understanding of the the heuristic for using a
hash-combine. For a fixed number of keys the hash-combine can be beneficial
independent of the size of the data set. Basing the decision on the ratio of
keys to values, as the size of the data set increases the likelihood of
matching keys and values occurring in the same combine operation (before
filling and being flushed to the reducer) decreases.
This is often the case for graphs. I'm thinking that the improvement for
using hash-combine on larger data sets may have been due to hashing performing
better than sort when we wanted to disable the combiner.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---