I think we should keep the tests around and encourage people to use it.

With the work that Ken put in to get the Pipeline proto to passed to the
Dataflow workers means that these composites will be flowing through
Dataflow to the execution side. I could see that this will eventually show
up in the Dataflow UI as well.

On Mon, Oct 30, 2017 at 1:45 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

> On Fri, Oct 27, 2017 at 12:40 AM Kenneth Knowles <k...@google.com.invalid>
> wrote:
>
> > I see where you are coming from. It is truly a marginal feature of Beam
> at
> > the moment, but really *really* useful in debugging, when a runner takes
> > advantage of it. More inline - FWIW it may seem like I'm defending the
> > tests, but I have been in the same situation and felt the same way. I'm
> > just trying to ground things in specifics.
> >
> > On Thu, Oct 26, 2017 at 2:24 PM, Eugene Kirpichov <
> > kirpic...@google.com.invalid> wrote:
> >
> > > Hello,
> > >
> > > DisplayData is currently in an odd position:
> > > - Dataflow is the only runner that supports it at all [1]
> > > - Dataflow supports it only for primitive transforms, but discards it
> for
> > > composite transforms [2](which is quite problematic for very complex
> but
> > > commonly used composite transforms, such as TextIO.write() and
> > > BigQueryIO.write())
> > >
> >
> > As of today, the Java and Python Dataflow runners actually transmit the
> > entire original pipeline, which has slots for display data on composites.
> > So the future in which this is actually used - at least for Dataflow -
> > might be very near.
> >
> >
> > - There are a lot of tests testing display data of composite transforms -
> > > these tests are currently testing something that is a no-op in
> > production;
> > > and these tests often are a maintenance burden when refactoring a
> > > transform.
> >
> >
> > Which composites' tests in particular? Are they legitimate tests?
> >
> Pretty much all of them.
> For example:
> https://github.com/apache/beam/blob/master/sdks/java/
> core/src/test/java/org/apache/beam/sdk/io/TextIOWriteTest.java#L585
>
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-
> <https://github.com/apache/beam/blob/master/sdks/java/io/
> google-cloud-platform/src/test/java/org/apache/beam/sdk/
> io/gcp/pubsub/PubsubIOTest.java#L98>
> platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/
> PubsubIOTest.java#L98
> <https://github.com/apache/beam/blob/master/sdks/java/io/
> google-cloud-platform/src/test/java/org/apache/beam/sdk/
> io/gcp/pubsub/PubsubIOTest.java#L98>
>  and
> https://github.com/apache/beam/blob/master/sdks/java/io/
> google-cloud-platform/src/test/java/org/apache/beam/sdk/
> io/gcp/pubsub/PubsubIOTest.java#L222
>
>
> https://github.com/apache/beam/blob/master/sdks/java/
> core/src/test/java/org/apache/beam/sdk/transforms/
> FlatMapElementsTest.java#L169
>
>
> and so on. I think nearly all transforms in the SDK are composite, and have
> unit tests that test DisplayData of the transform itself, which is
> currently moot. Some have tests of "display data of primitive transforms"
> which is currently more useful in practice.
>
>
> >
> > What should we do about this? I see two options:
> > > - Publish guidance for library authors that Beam is not currently ready
> > for
> > > supporting HasDisplayData on composite transforms, and you shouldn't
> > bother
> > > implementing or testing it (though you can implement it on primitive
> > > transforms of course - which is basically just Read.from(Source) and
> > ParDo)
> > > - Say that people should be implementing and testing this feature
> > > nevertheless, and promise that Beam runners are going to make it "worth
> > it"
> > > (not be a no-op) in the foreseeable future.
> > >
> >
> > Option 3: Don't say anyone _should_ do any particular thing, but promise
> > that [some] Beam runners are [probably] going to make it "worth it"
> pretty
> > soon. I expect third-party transform authors are already not using this,
> > are they?
> >
> Most aren't, but some are [looking at sdks/java/io]:
> https://github.com/apache/beam/blob/master/sdks/java/io/
> hbase/src/test/java/org/apache/beam/sdk/io/hbase/HBaseIOTest.java#L328
>
> https://github.com/apache/beam/blob/master/sdks/java/io/
> tika/src/test/java/org/apache/beam/sdk/io/tika/TikaIOTest.java#L128
>
> https://github.com/apache/beam/blob/master/sdks/java/io/
> google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/
> DatastoreV1Test.java#L223
>
> I guess if Dataflow's (or another runner's) proper support of DisplayData
> on composite transforms is sufficiently near, or at least planned, then we
> should keep existing tests, and add more support and more tests once the
> support lands; but meanwhile not require or encourage new transform authors
> to implement it (but it's fine if they do). How does this sound?
>
>
> >
> > Kenn
> >
> >
> >
> > > [1]
> > > https://lists.apache.org/thread.html/53b33f17d981054ed198af03511969
> > > 07bf147b6acf16160dbf395b62@%3Cdev.beam.apache.org%3E
> > > [2] https://issues.apache.org/jira/browse/BEAM-366
> > >
> >
>

Reply via email to