Re: Running examples with different runners

Lukasz Cwik Fri, 24 Jun 2016 17:29:05 -0700

Below I outline a different approach than the DirectRunner which didn't
require an override for Create since it knows that there was no data
remaining and can correctly shut the pipeline down by pushing the watermark
all the way through the pipeline. This is a superior approach but I believe
is more difficult to get right.


PAssert emits an aggregator with a specific name which states that the
PAssert succeeded or failed:
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L110

The test Dataflow runner counts how many PAsserts were applied and then
polls itself every 10 seconds checking to see if the aggregator has any
failures or all the successes for streaming pipelines.
Polling logic:
https://github.com/apache/incubator-beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java#L114
Check logic:
https://github.com/apache/incubator-beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java#L177

As for overriding a transform, the runner is currently invoked during
application of a transform and is able to inject/replace/modify the
transform that was being applied. The test Dataflow runner uses this a
little bit to do the PAssert counting while the normal Dataflow runner does
this a lot for its own specific needs.

Finally, I believe Ken just made some changes which removed the requirement
to support View.YYY and replaced it with GroupByKey so the no translator
registered for View... may go away.


On Fri, Jun 24, 2016 at 4:52 PM, Thomas Weise <[email protected]>
wrote:

> Kenneth and Lukasz, thanks for the direction.
>
> Is there any information about other requirements to run the cross runner
> tests and hints to troubleshoot. On first attempt they mosty fail due to
> missing translator:
>
> PAssertTest.testIsEqualTo:219 ▒ IllegalState no translator registered for
> View...
>
> Also, for run() to be synchronous or wait, there needs to be an exit
> condition. I know how to solve this for the Apex runner specific tests. But
> for the cross runner tests, what is the recommended way to do this? Kenneth
> mentioned that Create could signal end of stream. Should I look to override
> the Create transformation to configure the behavior ((just for this test
> suite) and if so, is there an example how to do this cleanly?
>
> Thanks,
> Thomas
>
>
>
>
> On Tue, Jun 21, 2016 at 7:32 PM, Kenneth Knowles <[email protected]>
> wrote:
>
> > To expand on the RunnableOnService test suggestion, here [1] is the
> commit
> > from the Spark runner. You will get a lot more information if you can
> port
> > this for your runner than you would from an example end-to-end test.
> >
> > Note that this just pulls in the tests from the core SDK. For testing
> with
> > other I/O connectors, you'll add them to the dependenciesToScan.
> >
> > [1]
> >
> >
> https://github.com/apache/incubator-beam/commit/4254749bf103c4bb6f68e316768c0aa46d9f7df0
> >
> > On Tue, Jun 21, 2016 at 4:06 PM, Lukasz Cwik <[email protected]>
> > wrote:
> >
> > > There is a start to getting more e2e like integration tests going with
> > the
> > > first being WordCount.
> > >
> > >
> >
> https://github.com/apache/incubator-beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java
> > > You could add WindowedWordCountIT.java which will be launched with the
> > > proper configuration of the Apex runner pom.xml
> > >
> > > I would also suggest that you take a look at the @RunnableOnService
> tests
> > > which are a comprehensive validation suite of ~200ish tests that test
> > > everything from triggers to side inputs. It requires some pom changes
> and
> > > creating a test runner which is able to setup an apex environment.
> > >
> > > Furthermore, we could really use an addition to the Beam wiki about
> > testing
> > > and how runners write tests/execute tests/...
> > >
> > > Some relevant links:
> > > Older presentation about getting cross runner tests going:
> > >
> > >
> >
> https://docs.google.com/presentation/d/1uTb7dx4-Y2OM_B0_3XF_whwAL2FlDTTuq2QzP9sJ4Mg/edit#slide=id.g127d614316_19_39
> > >
> > > Examples of test runners:
> > >
> > >
> >
> https://github.com/apache/incubator-beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowRunner.java
> > >
> > >
> >
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/TestFlinkRunner.java
> > >
> > >
> >
> https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
> > >
> > > Section of pom dedicated to enabling runnable on service tests:
> > >
> > >
> >
> https://github.com/apache/incubator-beam/blob/master/runners/spark/pom.xml#L54
> > >
> > > On Tue, Jun 21, 2016 at 2:21 PM, Thomas Weise <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > As part of the Apex runner, we have a few unit tests for the
> supported
> > > > transformations. Next, I would like to test the WindowedWordCount
> > > example.
> > > >
> > > > Is there an example of configuring this pipeline for another runner?
> Is
> > > it
> > > > recommended to supply such configuration as a JUnit test? What is the
> > > > general (repeatable?) approach to exercise different runners with the
> > set
> > > > of example pipelines?
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > >
> >
>

Re: Running examples with different runners

Reply via email to