Yifan Mai created BEAM-11644:
--------------------------------
Summary: translations.pack_combiners optimizer causes breaking
change to metrics API
Key: BEAM-11644
URL: https://issues.apache.org/jira/browse/BEAM-11644
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Affects Versions: 2.27.0
Reporter: Yifan Mai
The translations.pack_combiners optimizer causes a breaking change in the
public metrics API. The issue arises because metrics are keyed and queryable by
step name, and the step name can change after combiner packing. Suppose we have
a pipeline that looks like `pipeline | CombinePerKey(combinefn_1); pipeline |
CombinePerKey(combinefn_2)` and both combinefn_1 and combinefn_2 increment the
same counter per input element. Previously, the result would have two counters,
one each for step combinefn_1 and combinefn_2; both will have value
num_input_elements. After combiner packing, the result will have one counter
for Packed[combinefn_1, combinefn] with value 2 * num_input_elements.
Unfortunately there is no easy fix for this because the runner has to somehow
be aware that a step is a packed step and use the appropriate metrics container
for the sub-step.
The short term workaround is to (1) add a note for 2.27 under known issues and
(2) make this phase opt-in in 2.28.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)