Luke Cwik created BEAM-9934:
-------------------------------
Summary: Resolve differences in beam:metric:element_count:v1
implementations
Key: BEAM-9934
URL: https://issues.apache.org/jira/browse/BEAM-9934
Project: Beam
Issue Type: Bug
Components: sdk-go, sdk-java-harness, sdk-py-harness
Reporter: Luke Cwik
Assignee: Luke Cwik
The [element
count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
metric represents the number of elements within a PCollection and is
interpreted differently across the Beam SDK versions.
In the [Java
SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
this represents the number of elements and includes how many windows those
elements are in. This metric is incremented as soon as the element has been
output.
In the [Python
SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
this represents the number of elements and doesn't include how many windows
those elements are in. The metric is also only incremented after the element
has finished processing.
The [Go
SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
does the same thing as Python.
Traditionally in Dataflow this has always been the exploded window element
count.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)