[
https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke Cwik updated BEAM-9934:
----------------------------
Description:
The [element
count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
metric represents the number of elements within a PCollection and is
interpreted differently across the Beam SDK versions.
In the [Java
SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
this represents the number of elements and includes how many windows those
elements are in. This metric is incremented as soon as the element has been
output.
In the [Python
SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
this represents the number of elements and doesn't include how many windows
those elements are in. The metric is also only incremented after the element
has finished processing.
The [Go
SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
does the same thing as Python.
Traditionally in Dataflow this has always been the exploded window element
count and the counter is incremented as soon as the element is output.
was:
The [element
count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
metric represents the number of elements within a PCollection and is
interpreted differently across the Beam SDK versions.
In the [Java
SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
this represents the number of elements and includes how many windows those
elements are in. This metric is incremented as soon as the element has been
output.
In the [Python
SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
this represents the number of elements and doesn't include how many windows
those elements are in. The metric is also only incremented after the element
has finished processing.
The [Go
SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
does the same thing as Python.
Traditionally in Dataflow this has always been the exploded window element
count and the counter if updated on output and not when the processing is
finished as can be seen
[here|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowOutputCounter.java#L63]
and
[here|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiver.java#L41].
> Resolve differences in beam:metric:element_count:v1 implementations
> -------------------------------------------------------------------
>
> Key: BEAM-9934
> URL: https://issues.apache.org/jira/browse/BEAM-9934
> Project: Beam
> Issue Type: Bug
> Components: sdk-go, sdk-java-harness, sdk-py-harness
> Reporter: Luke Cwik
> Assignee: Luke Cwik
> Priority: Major
>
> The [element
> count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
> metric represents the number of elements within a PCollection and is
> interpreted differently across the Beam SDK versions.
> In the [Java
> SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
> this represents the number of elements and includes how many windows those
> elements are in. This metric is incremented as soon as the element has been
> output.
> In the [Python
> SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
> this represents the number of elements and doesn't include how many windows
> those elements are in. The metric is also only incremented after the element
> has finished processing.
> The [Go
> SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
> does the same thing as Python.
> Traditionally in Dataflow this has always been the exploded window element
> count and the counter is incremented as soon as the element is output.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)