Romain Yon created BEAM-11578:
---------------------------------
Summary: `dataflow_metrics` (python) fails with TypeError (when
int overflowing?)
Key: BEAM-11578
URL: https://issues.apache.org/jira/browse/BEAM-11578
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Affects Versions: 2.25.0
Reporter: Romain Yon
Hi all,
It seems like the python beam job I'm running is failing because of a bug in
beam's metrics.
The logic of the job appears to work and the final output is successfully being
written on GCS, but the dataflow job throws and error and has a failed status:
```
Traceback (most recent call last):
File "path/to/my/code.py", line 11, in <module>
MyJob().run()
File "/path/to/my/lib.py", line 173, in run
for c in result.metrics().query()["counters"]
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_metrics.py",
line 261, in query
self._populate_metrics(response, metric_results, user_metrics=True)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_metrics.py",
line 188, in _populate_metrics
attempted = self._get_metric_value(metric['tentative'])
File
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_metrics.py",
line 224, in _get_metric_value
lambda x: x.key == 'sum').value.double_value)
TypeError: int() argument must be a string, a bytes-like object or a number,
not 'NoneType'
```
Note that prior to this stacktrace, there is a logging entry:
```
{"severity": "INFO", "message": "Distribution metric sum value seems to have
overflowed integer_value range, the correctness of sum or mean value may not be
guaranteed: <JsonValue\\n object_value: <JsonObject\\n properties:
[<Property\\n key: \'count\'\\n value: <JsonValue\\n integer_value: 96>>,
<Property\\n key: \'mean\'\\n value: <JsonValue\\n integer_value: 0>>,
<Property\\n key: \'max\'\\n value: <JsonValue\\n integer_value: 0>>,
<Property\\n key: \'min\'\\n value: <JsonValue\\n integer_value: 0>>,
<Property\\n key: \'sum\'\\n value: <JsonValue\\n integer_value: 0>>]>>"}
```
I guess there seems to be an issue while casting the overflowing int to a
double.
(Note: We don't really have control over the number of events being fired since
the metrics are emitted by `tensorflow_transform.beam.TransformDataset`)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)