[ 
https://issues.apache.org/jira/browse/BEAM-4546?focusedWorklogId=111731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111731
 ]

ASF GitHub Bot logged work on BEAM-4546:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jun/18 00:34
            Start Date: 14/Jun/18 00:34
    Worklog Time Spent: 10m 
      Work Description: aaltay commented on a change in pull request #5639: 
[BEAM-4546] Multi level combine
URL: https://github.com/apache/beam/pull/5639#discussion_r195274812
 
 

 ##########
 File path: sdks/python/apache_beam/transforms/core.py
 ##########
 @@ -1191,6 +1198,34 @@ class CombinePerKey(PTransformWithSideInputs):
   Returns:
     A PObject holding the result of the combine operation.
   """
+  def with_hot_key_fanout(self, fanout):
+    """A per-key combine operation like self but with two levels of 
aggregation.
+
+    If a given key is produced by too many upstream bundles, the final
+    reduction can become a bottleneck despite partial combining being lifted
+    pre-GroupByKey.  In these cases it can be helpful to perform intermediate
+    partial aggregations in parallel and then re-group to peform a final
+    (per-key) combine.  This is also useful for high-volume keys in streaming
+    where combiners are not generally lifted for latency reasons.
+
+    Note that a fanout greater than 1 requires the data to be sent through
+    two GroupByKeys, and a high fanout can also result in more shuffle data
+    due to less per-bundle combining. Setting the fanout for a key at 1 or less
+    places values on the "cold key" path that skip the intermeidate level of
 
 Review comment:
   nit; intermeidate -> intermediate

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 111731)
    Time Spent: 1h  (was: 50m)

> Implement with hot key fanout for combiners
> -------------------------------------------
>
>                 Key: BEAM-4546
>                 URL: https://issues.apache.org/jira/browse/BEAM-4546
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Ahmet Altay
>            Assignee: Robert Bradshaw
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to