On Fri, Oct 27, 2017 at 12:40 AM Kenneth Knowles <[email protected]>
wrote:

> I see where you are coming from. It is truly a marginal feature of Beam at
> the moment, but really *really* useful in debugging, when a runner takes
> advantage of it. More inline - FWIW it may seem like I'm defending the
> tests, but I have been in the same situation and felt the same way. I'm
> just trying to ground things in specifics.
>
> On Thu, Oct 26, 2017 at 2:24 PM, Eugene Kirpichov <
> [email protected]> wrote:
>
> > Hello,
> >
> > DisplayData is currently in an odd position:
> > - Dataflow is the only runner that supports it at all [1]
> > - Dataflow supports it only for primitive transforms, but discards it for
> > composite transforms [2](which is quite problematic for very complex but
> > commonly used composite transforms, such as TextIO.write() and
> > BigQueryIO.write())
> >
>
> As of today, the Java and Python Dataflow runners actually transmit the
> entire original pipeline, which has slots for display data on composites.
> So the future in which this is actually used - at least for Dataflow -
> might be very near.
>
>
> - There are a lot of tests testing display data of composite transforms -
> > these tests are currently testing something that is a no-op in
> production;
> > and these tests often are a maintenance burden when refactoring a
> > transform.
>
>
> Which composites' tests in particular? Are they legitimate tests?
>
Pretty much all of them.
For example:
https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java#L585

https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-
<https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java#L98>
platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java#L98
<https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java#L98>
 and
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOTest.java#L222


https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlatMapElementsTest.java#L169


and so on. I think nearly all transforms in the SDK are composite, and have
unit tests that test DisplayData of the transform itself, which is
currently moot. Some have tests of "display data of primitive transforms"
which is currently more useful in practice.


>
> What should we do about this? I see two options:
> > - Publish guidance for library authors that Beam is not currently ready
> for
> > supporting HasDisplayData on composite transforms, and you shouldn't
> bother
> > implementing or testing it (though you can implement it on primitive
> > transforms of course - which is basically just Read.from(Source) and
> ParDo)
> > - Say that people should be implementing and testing this feature
> > nevertheless, and promise that Beam runners are going to make it "worth
> it"
> > (not be a no-op) in the foreseeable future.
> >
>
> Option 3: Don't say anyone _should_ do any particular thing, but promise
> that [some] Beam runners are [probably] going to make it "worth it" pretty
> soon. I expect third-party transform authors are already not using this,
> are they?
>
Most aren't, but some are [looking at sdks/java/io]:
https://github.com/apache/beam/blob/master/sdks/java/io/hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java#L328

https://github.com/apache/beam/blob/master/sdks/java/io/tika/src/test/java/org/apache/beam/sdk/io/tika/TikaIOTest.java#L128

https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1Test.java#L223

I guess if Dataflow's (or another runner's) proper support of DisplayData
on composite transforms is sufficiently near, or at least planned, then we
should keep existing tests, and add more support and more tests once the
support lands; but meanwhile not require or encourage new transform authors
to implement it (but it's fine if they do). How does this sound?


>
> Kenn
>
>
>
> > [1]
> > https://lists.apache.org/thread.html/53b33f17d981054ed198af03511969
> > 07bf147b6acf16160dbf395b62@%3Cdev.beam.apache.org%3E
> > [2] https://issues.apache.org/jira/browse/BEAM-366
> >
>

Reply via email to