Interesting problem.

First, I do think that PAssert is defined entirely in terms of the model so
this is a real problem with the results that are produced.

Then, I think the answer might lie with triggering. When PAssert rewindows
into the global window and triggers "never" this actually means that its
internal GBK emits all of its output exactly once, when the window is
expiring. So producing more than one output - even if spread across
microbatches - is actually the trouble.

On Wed, Nov 2, 2016 at 10:20 AM Amit Sela <[email protected]> wrote:

> Spark won't evaluate empty PCollections as well, and I've handled it by
> asserting the SUCCESS/FAIL aggregators in PAssert in the TestSparkRunner.
>
> Main problem for Spark is when running gracefully it might take sometime
> to terminate and execute another batch which is an empty iterable, then
> fail on AssertionError <0> but expected <100>.
> Running ungracefully seemed to come back and bite me in the past, so I'm
> trying to avoid it.
>
> Thoughts ?
>
> On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin <[email protected]>
> wrote:
>
> (Also: I meant "tests will [incorrectly] pass silently")
>
> On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles <[email protected]> wrote:
>
> > FWIW if the runner is set up properly the tests will still fail with a
> > timeout waiting for the assertion aggregators to reach expected values.
> > Unfortunately we haven't yet centralized this functionality into
> > TestPipeline or thereabouts.
> >
> > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin <[email protected]> wrote:
> >
> >> +Kenn
> >>
> >> I believe this is done because if there is no output, no assertions will
> >> be run and tests will fail silently. (This was a side effect of
> switching
> >> from side inputs to groupbykey for this flow, which enabled testing of
> >> triggers/panes/etc.)
> >>
> >> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela <[email protected]> wrote:
> >>
> >> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> >> opened a ticket).
> >> Tested locally Direct/Flink/Spark runners, and it looks fine, I've
> issued
> >> a
> >> PR to see if it affects Dataflow runner.
> >>
> >> Amit.
> >>
> >> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré <[email protected]>
> >> wrote:
> >>
> >> > Agree, this element should be removed.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> >> > > Hi all,
> >> > >
> >> > > I've been looking at PAssert and I've notice that
> >> PAssert.GroupedGlobally
> >> > > points
> >> > > <
> >> > https://github.com/apache/incubator-beam/blob/master/sdks/
> >> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> >> > >
> >> > > that it will result in a singe empty iterable even if the input
> >> > PCollection
> >> > > is empty.
> >> > > This is a weird behaviour as it may cause following assertions to
> >> fail.
> >> > >
> >> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> >> > >
> >> > > Thanks,
> >> > > Amit
> >> > >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > [email protected]
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >>
> >>
>
>

Reply via email to