Andrei-Strenkovskii opened a new issue, #35850:
URL: https://github.com/apache/beam/issues/35850
### What happened?
I am building an application in Apache Beam and Python that runs in Google
DataFlow. I am using the ReadFromSpanner method in
apache_beam.io.gcp.experimental.spannerio.
On operation ReadFromSpanner, I get an error:
```
Python sdk harness failed:
Traceback (most recent call last):
File
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
line 366, in create_monitoring_info
return metrics_pb2.MonitoringInfo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen _collections_abc>", line 949, in update
TypeError: bad argument type for built-in operation
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 212, in main
sdk_harness.run()
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 283, in run
getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 361, in _request_harness_monitoring_infos
).to_runner_api_monitoring_infos(None).values()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "apache_beam/metrics/execution.py", line 334, in
apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
File "apache_beam/metrics/cells.py", line 76, in
apache_beam.metrics.cells.MetricCell.to_runner_api_monitoring_info
File "apache_beam/metrics/cells.py", line 158, in
apache_beam.metrics.cells.CounterCell.to_runner_api_monitoring_info_impl
File
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
line 233, in int64_counter
return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
line 369, in create_monitoring_info
raise RuntimeError(
RuntimeError: Failed to create MonitoringInfo for urn
beam:metric:io:api_request_count:v1 type <class 'type'> labels {labels} and
payload {payload}
return metrics_pb2.MonitoringInfo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen _collections_abc>", line 949, in update
TypeError: bad argument type for built-in operation
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 367, in <module>
main(sys.argv)
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 212, in main
sdk_harness.run()
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 283, in run
getattr(self, SdkHarness.REQUEST_METHOD_PREFIX + request_type)(
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 361, in _request_harness_monitoring_infos
).to_runner_api_monitoring_infos(None).values()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "apache_beam/metrics/execution.py", line 334, in
apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
File "apache_beam/metrics/cells.py", line 76, in
apache_beam.metrics.cells.MetricCell.to_runner_api_monitoring_info
File "apache_beam/metrics/cells.py", line 158, in
apache_beam.metrics.cells.CounterCell.to_runner_api_monitoring_info_impl
File
"/usr/local/lib/python3.11/site-packages/apache_beam/metrics/monitoring_infos.py",
line 233, in int64_counter
return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels)
```
I get this error ONLY with tables that have more than several hundred rows;
on small tables, the error is not presented
I tried to increase the number of workers and change the machine type, but
it didn't help
Example of my pipeline:
```
options = PipelineOptions([
'--runner=DataflowRunner',
'--project=pp-import-staging',
'--region=us-east4',
'--temp_location=gs-path',
'--staging_location=gs-path',
'--experiments=shuffle_mode=service',
f'--job_name=beam-test'
])
read_operations = [
ReadOperation.table(table=table, columns=[column]),
]
with beam.Pipeline(options=options) as pipeline:
all_users = pipeline | ReadFromSpanner(
'abc', 'instance', 'database',
read_operations=read_operations,
)
```
Versions:
apache_beam[gcp]==2.66.0
Python 3.11.8
### Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
### Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]