Re: Should tests fail due to transient errors on Dataflow Runner?

Łukasz Gajowy Wed, 07 Mar 2018 07:24:14 -0800

Thank you. I did a quick check based on what you are saying and it
confirmed that the streaming scenario is more tricky. Nevertheless this
seems to be the problem that makes JDBC IOIT flaky, so I created a Jira for
that: https://issues.apache.org/jira/browse/BEAM-3798



2018-03-06 1:52 GMT+01:00 Lukasz Cwik <lc...@google.com>:

> That makes sense but you'll want to make sure that no test + runner is
> relying on this behavior by making your change and running all the
> validates runner tests.
>
> Historically what you say was not always the case because Dataflow
> streaming jobs were never "DONE", they only were in the "RUNNING" state
> forever and required to be cancelled if an error message was ever seen.
>
> On Mon, Mar 5, 2018 at 6:30 AM, Łukasz Gajowy <lukasz.gaj...@gmail.com>
> wrote:
>
>> Hi there!
>>
>> I wonder: why tests that use TestDataflowRunner fail if there are some
>> transient difficulties on Dataflow pipeline?
>>
>> Let's consider the JDBC Performance test case: the pipelines that are
>> there sometimes have trouble connecting to a Postgres instance. If this
>> happens, they retry processing the bundle as described in Dataflow FAQ [1].
>> The PSQLExceptions that happen on Dataflow (due to connection problems) are
>> collected by TestDataflowRunner's messageHandler. After the whole data
>> processing is done, TestDataflowRunner "rethrows" gathered exceptions if
>> there are any ([2], [3]). IMO, this results in a "false-negative": maven
>> fails due to the exceptions being thrown, even despite the fact that the
>> job actually succeeded on Dataflow (State.DONE).
>>
>> I think we should "rethrow" those exceptions only if the job status is
>> other than DONE, which AFAIK means that the job succeeded on Dataflow. If
>> Dataflow managed to handle them, I don't see any reason for the test to
>> fail. Am I missing something here? WDYT?
>>
>> [1] https://cloud.google.com/dataflow/faq#how-are-java-exception
>> s-handled-in-dataflow
>> [2] https://github.com/apache/beam/blob/a3e262b96be5e6507f3c3841
>> 3341b4ab607ade41/runners/google-cloud-dataflow-java/
>> src/main/java/org/apache/beam/runners/dataflow/
>> TestDataflowRunner.java#L197
>> [3] https://builds.apache.org/view/A-D/view/Beam/job/beam_Pe
>> rformanceTests_JDBC/291/console
>>
>>
>

Re: Should tests fail due to transient errors on Dataflow Runner?

Reply via email to