I didn't have any other changes. I ran the tests with a clean virtualenv as you suggested and it works now. :)
Thanks Ahmet and Chamikara! On Tue, Jun 4, 2019 at 6:36 AM Chamikara Jayalath <chamik...@google.com> wrote: > Sounds like your input job was somehow incompatible with the Dataflow > worker. Running using a clean virtual env should help verify as Ahmet > mentioned. > > On Mon, Jun 3, 2019 at 5:44 PM Ahmet Altay <al...@google.com> wrote: > >> Do you have any other changes? Are you trying from head with a clean >> virtual environment? >> >> If you can share a link to dataflow job (in the apache-beam-testing GCP >> project), we can try to look at additional logs as well. >> >> On Mon, Jun 3, 2019 at 1:42 PM Tanay Tummalapalli <ttanay...@gmail.com> >> wrote: >> >>> Hi everyone, >>> >>> I ran the Integration Tests - >>> BigQueryStreamingInsertTransformIntegrationTests[1] and >>> BigQueryFileLoadsIT[2] on the master branch locally, with the following >>> command: >>> ./scripts/run_integration_test.sh --test_opts >>> --tests=apache_beam.io.gcp.bigquery_test:BigQueryStreamingInsertTransformIntegrationTests >>> The Dataflow jobs for the tests failed with the following error: >>> root: INFO: 2019-06-03T18:36:53.021Z: JOB_MESSAGE_ERROR: Traceback >>> (most recent call last): >>> File >>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", >>> line 649, in do_work >>> work_executor.execute() >>> File >>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", >>> line 150, in execute >>> test_shuffle_sink=self._test_shuffle_sink) >>> File >>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", >>> line 116, in create_operation >>> is_streaming=False) >>> File "apache_beam/runners/worker/operations.py", line 962, in >>> apache_beam.runners.worker.operations.create_operation >>> op = BatchGroupAlsoByWindowsOperation( >>> File "dataflow_worker/shuffle_operations.py", line 219, in >>> dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation. >>> __init__ >>> self.windowing = deserialize_windowing_strategy(self.spec.window_fn) >>> File "dataflow_worker/shuffle_operations.py", line 207, in >>> dataflow_worker.shuffle_operations.deserialize_windowing_strategy >>> return pickler.loads(serialized_data) >>> File >>> "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", >>> line 248, in loads >>> c = base64.b64decode(encoded) >>> File "/usr/lib/python2.7/base64.py", line 78, in b64decode >>> raise TypeError(msg) >>> TypeError: Incorrect padding >>> >>> >>> I tested the same tests on the 2.13.0-RC#2 branch as well and they >>> passed. These tests also don't fail in the most recent Python post-commit >>> tests[3-5]. >>> >>> Keeping in mind the recent b64 changes in BQ, none of the tests in the >>> test classes mentioned above makes use of a "BYTES" type field. >>> Would love to get pointers to possible reasons. >>> >>> Thank You >>> - TT >>> >>> [1] >>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_test.py#L479-L630 >>> [2] >>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py#L358-L528 >>> [3] >>> https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/ >>> [4] >>> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/ >>> [5] >>> https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/ >>> >>