Re: [DISCUSSION] PAssert success/failure count validation for all runners

Ismaël Mejía Mon, 10 Apr 2017 07:42:17 -0700

I have the impression this conversation went into a different
sub-discussion ignoring the core subject that is if it makes sense to
do the implementation of Passert as we are doing it right now (1), or
in a runner agnostic way (2).


Big +1 for (2).

And I think also this is critical enough to be part of the First
Stable Release ones (FSR), again I have the impression that removing
aggregators is a top priority, but we cannot remove them if we don’t
cover the exact same use cases with metrics and we have an acceptable
level of support from the runners.

Regards,
Ismaël

On Sat, Apr 8, 2017 at 4:00 PM, Aljoscha Krettek <[email protected]> wrote:
> @kenn What’s the design you’re mentioning? (I probably missed it because I’m 
> not completely up to data on the Jiras and ML because of Flink Forward 
> preparations)
>
>> On 7. Apr 2017, at 12:42, Kenneth Knowles <[email protected]> wrote:
>>
>> We also have a design that improves the signal even without metrics, so I'm
>> pretty happy with this.
>>
>> On Fri, Apr 7, 2017 at 12:12 PM, Lukasz Cwik <[email protected]>
>> wrote:
>>
>>> I like the usage of metrics since it doesn't depend on external resources.
>>> I believe there could be some small amount of code shared between runners
>>> for the PAssert metric verification.
>>>
>>> I would say that PAssert by itself and PAssert with metrics are two levels
>>> of testing available. For runners that don't support metrics than PAssert
>>> gives a signal (albeit weaker one) and ones that do support metrics will
>>> have a stronger signal for execution correctness.
>>>
>>> On Fri, Apr 7, 2017 at 11:59 AM, Aviem Zur <[email protected]> wrote:
>>>
>>>> Currently, PAssert assertions may not happen and tests will pass while
>>>> silently hiding issues.
>>>>
>>>> Up until now, several runners have implemented an assertion that the
>>> number
>>>> of expected successful assertions have actually happened, and that no
>>>> failed assertions have happened. (runners which check this are Dataflow
>>>> runner and Spark runner).
>>>>
>>>> This has been valuable in the past to find bugs which were hidden by
>>>> passing tests.
>>>>
>>>> The work to repeat this in https://issues.apache.org/
>>> jira/browse/BEAM-1726
>>>> has
>>>> surfaced bugs in the Flink runner that were also hidden by passing tests.
>>>> However, with the removal of aggregators in
>>>> https://issues.apache.org/jira/browse/BEAM-1148 this ticket will be
>>> harder
>>>> to implement, since Flink runner does not support metrics.
>>>>
>>>> I believe that validating that runners do in fact support Beam model is a
>>>> blocker for first stable release. (BEAM-1726 was also marked as a blocker
>>>> for Flink runner).
>>>>
>>>> I think we have one of 2 choices here:
>>>> 1. Keep implementing this for each runner separately.
>>>> 2. Implement this in a runner agnostic way (For runners which support
>>>> metrics - use metrics, for those that do not use a fallback
>>> implementation,
>>>> perhaps using files or some other method). This should be covered by the
>>>> following ticket: https://issues.apache.org/jira/browse/BEAM-1763
>>>>
>>>> Thoughts?
>>>>
>>>
>

Re: [DISCUSSION] PAssert success/failure count validation for all runners

Reply via email to