Yifan Mai created BEAM-11644:
--------------------------------

             Summary: translations.pack_combiners optimizer causes breaking 
change to metrics API
                 Key: BEAM-11644
                 URL: https://issues.apache.org/jira/browse/BEAM-11644
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.27.0
            Reporter: Yifan Mai


The translations.pack_combiners optimizer causes a breaking change in the 
public metrics API. The issue arises because metrics are keyed and queryable by 
step name, and the step name can change after combiner packing. Suppose we have 
a pipeline that looks like `pipeline | CombinePerKey(combinefn_1); pipeline | 
CombinePerKey(combinefn_2)` and both combinefn_1 and combinefn_2 increment the 
same counter per input element. Previously, the result would have two counters, 
one each for step combinefn_1 and combinefn_2; both will have value 
num_input_elements. After combiner packing, the result will have one counter 
for Packed[combinefn_1, combinefn] with value 2 * num_input_elements.

Unfortunately there is no easy fix for this because the runner has to somehow 
be aware that a step is a packed step and use the appropriate metrics container 
for the sub-step.

The short term workaround is to (1) add a note for 2.27 under known issues and 
(2) make this phase opt-in in 2.28.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to