Hi there! I wonder: why tests that use TestDataflowRunner fail if there are some transient difficulties on Dataflow pipeline?
Let's consider the JDBC Performance test case: the pipelines that are there sometimes have trouble connecting to a Postgres instance. If this happens, they retry processing the bundle as described in Dataflow FAQ [1]. The PSQLExceptions that happen on Dataflow (due to connection problems) are collected by TestDataflowRunner's messageHandler. After the whole data processing is done, TestDataflowRunner "rethrows" gathered exceptions if there are any ([2], [3]). IMO, this results in a "false-negative": maven fails due to the exceptions being thrown, even despite the fact that the job actually succeeded on Dataflow (State.DONE). I think we should "rethrow" those exceptions only if the job status is other than DONE, which AFAIK means that the job succeeded on Dataflow. If Dataflow managed to handle them, I don't see any reason for the test to fail. Am I missing something here? WDYT? [1] https://cloud.google.com/dataflow/faq#how-are-java-exceptions-handled-in-dataflow [2] https://github.com/apache/beam/blob/a3e262b96be5e6507f3c38413341b4ab607ade41/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java#L197 [3] https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/291/console
