On Thu, Apr 4, 2019 at 6:38 PM Lukasz Cwik <lc...@google.com> wrote:
>
> The issue with unbounded tests that rely on triggers/late data/early 
> firings/processing time is that these are several sources of non-determinism. 
> The sources make non-deterministic decisions around when to produce data, 
> checkpoint, and resume and runners make non-deterministic decisions around 
> when to output elements, in which order, and when to evaluate triggers. 
> UsesTestStream is the best set of tests we currently have for making 
> non-deterministic processing decisions deterministic but are more difficult 
> to write then the other ValidatesRunner tests and also not well supported 
> because of the special nature of UsesTestStream needing special hooks within 
> the runner to control when to output and when to advance time.
>
> I'm not aware of any tests that we currently have that run a non 
> deterministic pipeline and evaluate it against all possible outcomes that 
> could have been produced and check that the output was valid. We would 
> welcome ideas in how to improve this space to get more runners being tested 
> for non-deterministic pipelines.

Python has some tests of this nature, e.g.

https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L308

I'd imagine we could do similar for Java.

> On Thu, Apr 4, 2019 at 3:36 AM Jozsef Bartok <jo...@hazelcast.com> wrote:
>>
>> Hi.
>>
>> My name is Jozsef, I've been working on Runners based on Hazelcast Jet. 
>> Plural because we have both an "old-style" and a "portable" Runner in 
>> development (https://github.com/hazelcast/hazelcast-jet-beam-runner).
>>
>> While our portable one isn't even functional yet, the "old-style" type of 
>> Runner is a bit more mature. It handles only bounded data, but for that case 
>> it passes all Beam tests of ValidatesRunner category and runs the Nexmark 
>> suite successfully too (I'm refering only to correctness, because 
>> performance is not yet where it can be, we aren't doing any Pipeline surgery 
>> yet and no other optimizations either).
>>
>> Since a few days we have started extending it for unbounded data, so we have 
>> started adding support for things like triggers, watermarks and such and we 
>> are wondering how come we can't find ValidatesRunner tests specific to 
>> unbounded data. Tests from the UsesTestStream category seem to be kind of a 
>> candidate for this, but they have nowhere near the coverage and completeness 
>> provided by the ValidatesRunner ones.
>>
>> I think we are missing something and I don't know what... Could you pls. 
>> advise?
>>
>> Rgds,
>> Jozsef

Reply via email to