TheNeuralBit opened a new issue, #22440: URL: https://github.com/apache/beam/issues/22440
### What happened? See https://ci-beam.apache.org/job/beam_LoadTests_Python_SideInput_Dataflow_Batch/654/console Job is failing with ``` 13:00:34 INFO:apache_beam.runners.dataflow.dataflow_runner:2022-07-25T20:00:32.836Z: JOB_MESSAGE_BASIC: Stopping **** pool... 13:00:34 INFO:apache_beam.runners.dataflow.dataflow_runner:Job 2022-07-25_09_00_14-1969075893707144978 is in state JOB_STATE_CANCELLING 13:00:35 INFO:apache_beam.runners.dataflow.dataflow_runner:2022-07-25T20:01:07.531Z: JOB_MESSAGE_DETAILED: Autoscaling: Resized **** pool from 10 to 0. 13:01:08 INFO:apache_beam.runners.dataflow.dataflow_runner:2022-07-25T20:01:07.588Z: JOB_MESSAGE_BASIC: Worker pool stopped. 13:01:08 INFO:apache_beam.runners.dataflow.dataflow_runner:2022-07-25T20:01:07.612Z: JOB_MESSAGE_DEBUG: Tearing down pending resources... 13:01:08 INFO:apache_beam.runners.dataflow.dataflow_runner:Job 2022-07-25_09_00_14-1969075893707144978 is in state JOB_STATE_CANCELLED 13:01:15 ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/<RegionId>/2022-07-25_09_00_14-1969075893707144978?project=<ProjectId> 13:01:16 Traceback (most recent call last): 13:01:16 File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main 13:01:16 "__main__", mod_spec) 13:01:16 File "/usr/lib/python3.7/runpy.py", line 85, in _run_code 13:01:16 exec(code, run_globals) 13:01:16 File "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_SideInput_Dataflow_Batch/src/sdks/python/apache_beam/testing/load_tests/sideinput_test.py", line 216, in <module> 13:01:16 SideInputTest().run() 13:01:16 File "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_SideInput_Dataflow_Batch/src/sdks/python/apache_beam/testing/load_tests/load_test.py", line 151, in run 13:01:16 self.result.wait_until_finish(duration=self.timeout_ms) 13:01:16 File "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_SideInput_Dataflow_Batch/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 1676, in wait_until_finish 13:01:16 self) 13:01:16 apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: CANCELLED, Error: 13:01:16 None 13:01:16 13:01:17 > Task :sdks:python:apache_beam:testing:load_tests:run FAILED ``` Opening up the Dataflow console we see the following errors in the worker logs: ``` An exception was raised when trying to execute the workitem 8003945814083367836 : Traceback (most recent call last): File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 837, in apache_beam.runners.common.PerWindowInvoker.invoke_process File "apache_beam/runners/common.py", line 983, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_SideInput_Dataflow_Batch/src/sdks/python/apache_beam/testing/load_tests/sideinput_test.py", line 123, in process File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/sideinputs.py", line 114, in __iter__ for wv in self._iterable: File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sideinputs.py", line 180, in __iter__ raise self.reader_exceptions.get() File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sideinputs.py", line 130, in _reader_thread for value in reader: File "/usr/local/lib/python3.7/site-packages/dataflow_worker/nativefileio.py", line 204, in __iter__ for record in self.read_next_block(): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/nativeavroio.py", line 362, in read_next_block fastavro_block = next(self._block_iterator) File "fastavro/_read.pyx", line 1051, in fastavro._read.file_reader.__next__ File "fastavro/_read.pyx", line 953, in _iter_avro_blocks File "fastavro/_read.pyx", line 854, in fastavro._read.snappy_read_block File "fastavro/_read.pyx", line 856, in fastavro._read.snappy_read_block File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystemio.py", line 112, in readinto data = self._downloader.get_range(start, end) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/gcsio.py", line 701, in get_range self._downloader.GetRange(start, end - 1) File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 486, in GetRange response = self.__ProcessResponse(response) File "/usr/local/lib/python3.7/site-packages/apitools/base/py/transfer.py", line 424, in __ProcessResponse raise exceptions.HttpError.FromResponse(response) apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing <https://www.googleapis.com/storage/v1/b/temp-storage-for-perf-tests/o/loadtests%2Fload-tests-python-dataflow-batch-sideinput-9-0725100319.1658764814.369494%2Ftmp-e6377a3786602f76-00003-of-00028.avro?alt=media&generation=1658765290656871>: response: <{'x-guploader-uploadid': 'ADPycds9lcszewL5vWbRPedbhn7ewfjlPPHltdw2rk8IwZa4aEcwdveOb1sNgE5JHOXCGd166jZ-q0raGSyH1mAAOj9ZEA', 'content-type': 'text/html; charset=UTF-8', 'date': 'Mon, 25 Jul 2022 20:00:32 GMT', 'vary': 'Origin, X-Origin', 'expires': 'Mon, 25 Jul 2022 20:00:32 GMT', 'cache-control': 'private, max-age=0', 'content-length': '168', 'server': 'UploadServer', 'status': '404'}>, content <No such object: temp-storage-for-perf-tests/loadtests/load-tests-python-dataflow-batch-sideinput-9-0725100319.1658764814.369494/tmp-e6377a3786602f76-00003-of-00028.avro> ``` ### Issue Priority Priority: 1 ### Issue Component Component: test-failures -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
