damccorm opened a new issue, #21080:
URL: https://github.com/apache/beam/issues/21080
`insertAll` will retry forever on a streaming pipeline running on `2.31.0`,
with `insert_retry_strategy=RetryStrategy.RETRY_NEVER`, and
`create_disposition=BigQueryDisposition.CREATE_NEVER`.
Found while testing error handling for a pipeline by writing to a table that
doesn't exist, ending up with no element in `BigQueryWriteFn.FAILED_ROWS` and
these errors repeated in the logs:
```
Error message from worker: generic::unknown: Traceback (most recent call
last):
File "apache_beam/runners/common.py",
line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
File "apache_beam/runners/common.py",
line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
File "apache_beam/runners/common.py",
line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1268, in finish_bundle
return self._flush_all_batches()
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1278, in _flush_all_batches
for destination in list(self._rows_buffer.keys())
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1279, in <listcomp>
if self._rows_buffer[destination]
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1312, in _flush_batch
skip_invalid_rows=True)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 1125, in insert_rows
project_id, dataset_id, table_id, final_rows, skip_invalid_rows)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line
253, in wrapper
return
fun(*args, **kwargs)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 637, in _insert_all_rows
response = self.client.tabledata.InsertAll(request)
File
"/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
line 795, in InsertAll
config, request, global_params=global_params)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 731, in _RunMethod
return self.ProcessHttpResponse(method_config, http_response, request)
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 737, in ProcessHttpResponse
self.__ProcessHttpResponse(method_config, http_response, request))
File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 604, in __ProcessHttpResponse
http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpNotFoundError:
HttpError accessing
<https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type':
'application/json; charset=UTF-8',
'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control':
'private', 'x-xss-protection':
'0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked',
'status': '404', 'content-length':
'344', '-content-encoding': 'gzip'}>, content <{
"error": {
"code": 404,
"message": "Not
found: Table <REDACTED>:testdb__dbo__raw.customers",
"errors": [
{
"message": "Not
found: Table <REDACTED>:testdb__dbo__raw.customers",
"domain": "global",
"reason":
"notFound"
}
],
"status": "NOT_FOUND"
}
}
...
```
Possibly related to BEAM-12362. Had been running on `2.29.0` previously,
which would send errors repeatedly with no trace:
```
There were errors inserting to BigQuery. Will not retry. Errors were []
```
`2.31.0` is logging the errors but ignores retry strategy, preventing errors
from being handled through `FailedRows` tag.
Imported from Jira
[BEAM-12783](https://issues.apache.org/jira/browse/BEAM-12783). Original Jira
may contain additional context.
Reported by: ajdub980a.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]