The only signal we have is that the runner terminates the control channel. It might make sense to make this more explicit. (This'd be especially nice in batch, where we could (hypothetically at least) know we'll never run a given stage again.)
On Wed, May 15, 2019 at 3:58 PM Robert Burke <rob...@frantil.com> wrote: > > What is the runner supposed to be doing to trigger the teardown of given > bundle descriptors in an SDK harness? > > Is there a fn API call I'm not interpreting correctly that should reliably > trigger DoFn teardown, or generally that bundle processing is done? > > > > On Wed, May 15, 2019, 6:51 AM Robert Bradshaw <rober...@google.com> wrote: >> >> This does bring up an interesting question though. Are runners >> violating (the intent of) the spec if they simply abandon/kill workers >> rather than gracefully bringing them down (e.g. so that these >> callbacks can be invoked)? >> >> On Tue, May 7, 2019 at 3:55 PM Michael Luckey <adude3...@gmail.com> wrote: >> > >> > Thanks Kenn and Reuven. Based on your feedback, I amended to the PR [1] >> > implementing the missing calls to teardown. >> > >> > Best, >> > >> > michel >> > >> > [1] https://github.com/apache/beam/pull/8495 >> > >> > On Tue, May 7, 2019 at 6:09 AM Kenneth Knowles <k...@apache.org> wrote: >> >> >> >> >> >> >> >> On Mon, May 6, 2019 at 2:19 PM Reuven Lax <re...@google.com> wrote: >> >>> >> >>> >> >>> >> >>> On Mon, May 6, 2019 at 2:06 PM Kenneth Knowles <k...@apache.org> wrote: >> >>>> >> >>>> The specification of TearDown is that it is best effort, certainly. >> >>> >> >>> >> >>> Though I believe the intent of that specification was that a runner will >> >>> call it as long as the process itself has not crashed. >> >> >> >> >> >> Yea, exactly. Or more abstractly that a runner will call it unless it is >> >> impossible. If the hardware fails, a meteor strikes, etc, then teardown >> >> will not be called. But in normal operation, particularly when the user >> >> code throws a recoverable exception, it should be called. >> >> >> >> Kenn >> >> >> >>> >> >>> >> >>>> >> >>>> If your runner supports it, then the test is good to make sure there is >> >>>> not a regression. If your runner has partial support, that is within >> >>>> spec. But the idea of the spec is more than you might have such a >> >>>> failure that it is impossible to call the method, not simply never >> >>>> trying to call it. >> >>>> >> >>>> I think it seems to match what we do elsewhere to leave the test, add >> >>>> an annotation, make a note in the capability matrix about the >> >>>> limitation on ParDo. >> >>>> >> >>>> Kenn >> >>>> >> >>>> On Mon, May 6, 2019 at 5:45 AM Michael Luckey <adude3...@gmail.com> >> >>>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> after stumbling upon [1] and trying to implement a fix [2], >> >>>>> ParDoLifeCycleTest are failing for >> >>>>> direct runner, spark validatesRunnerBatch and flink >> >>>>> validatesRunnerBatch fail as DoFns teardown is not invoked, if DoFns >> >>>>> setup throw an exceptions. >> >>>>> >> >>>>> This seems to be in line with the specification [3], as this >> >>>>> explicitly states that 'teardown might not be called if unnecessary as >> >>>>> processed will be killed anyway'. >> >>>>> >> >>>>> No I am a bit lost on how to resolve this situation. Currently, we >> >>>>> seem to have following options >> >>>>> - remove the test, although it seems valuable in different (e.g. >> >>>>> streaming?) cases >> >>>>> - to satisfy the test implement the call to teardown in runners >> >>>>> although it seems unnecessary >> >>>>> - add another annotation @CallsTeardownAfterFailingSetup, >> >>>>> @UsesFullParDoLifeCycle or such (would love to get suggestions for >> >>>>> better name here) >> >>>>> - ? >> >>>>> >> >>>>> Thoughts? >> >>>>> >> >>>>> Best, >> >>>>> >> >>>>> michel >> >>>>> >> >>>>> >> >>>>> >> >>>>> [1] https://issues.apache.org/jira/browse/BEAM-7197 >> >>>>> [2] https://github.com/apache/beam/pull/8495 >> >>>>> [3] >> >>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L676-L680