On Thu, Apr 4, 2019 at 6:38 PM Lukasz Cwik <lc...@google.com> wrote: > > The issue with unbounded tests that rely on triggers/late data/early > firings/processing time is that these are several sources of non-determinism. > The sources make non-deterministic decisions around when to produce data, > checkpoint, and resume and runners make non-deterministic decisions around > when to output elements, in which order, and when to evaluate triggers. > UsesTestStream is the best set of tests we currently have for making > non-deterministic processing decisions deterministic but are more difficult > to write then the other ValidatesRunner tests and also not well supported > because of the special nature of UsesTestStream needing special hooks within > the runner to control when to output and when to advance time. > > I'm not aware of any tests that we currently have that run a non > deterministic pipeline and evaluate it against all possible outcomes that > could have been produced and check that the output was valid. We would > welcome ideas in how to improve this space to get more runners being tested > for non-deterministic pipelines.
Python has some tests of this nature, e.g. https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L308 I'd imagine we could do similar for Java. > On Thu, Apr 4, 2019 at 3:36 AM Jozsef Bartok <jo...@hazelcast.com> wrote: >> >> Hi. >> >> My name is Jozsef, I've been working on Runners based on Hazelcast Jet. >> Plural because we have both an "old-style" and a "portable" Runner in >> development (https://github.com/hazelcast/hazelcast-jet-beam-runner). >> >> While our portable one isn't even functional yet, the "old-style" type of >> Runner is a bit more mature. It handles only bounded data, but for that case >> it passes all Beam tests of ValidatesRunner category and runs the Nexmark >> suite successfully too (I'm refering only to correctness, because >> performance is not yet where it can be, we aren't doing any Pipeline surgery >> yet and no other optimizations either). >> >> Since a few days we have started extending it for unbounded data, so we have >> started adding support for things like triggers, watermarks and such and we >> are wondering how come we can't find ValidatesRunner tests specific to >> unbounded data. Tests from the UsesTestStream category seem to be kind of a >> candidate for this, but they have nowhere near the coverage and completeness >> provided by the ValidatesRunner ones. >> >> I think we are missing something and I don't know what... Could you pls. >> advise? >> >> Rgds, >> Jozsef