Re: Specifying type arguments for generic PTransform builders

Robert Bradshaw Thu, 06 Oct 2016 14:38:17 -0700

Thanks for bringing this up. This inconsistency is something that has
bothered me as well.


Personally, I'm a fan of #3 when there's an obvious, required,
type-providing argument, and #1 otherwise. Note that these two are not
necessarily mutually exclusive.

Another thing to keep in mind is that Java8 is a bit smarter about
inferring types, and can do so from the context, e.g. one can write
pc.apply(GroupByKey.<>create()) and the types of pc will populate the
types of GroupByKey.create(). It might be able to infer from later
arguments as well.

- Robert



On Thu, Oct 6, 2016 at 2:09 PM, Eugene Kirpichov
<kirpic...@google.com.invalid> wrote:
> Quite a few transforms in the SDK are generic (i.e. have type parameters),
> e.g. ParDo, GroupByKey, Keys / WithKeys, many connectors (TextIO, KafkaIO,
> JdbcIO, MongoDbGridFSIO etc - both read and write). They use different
> styles of binding the type parameters to concrete types in caller code.
>
> I would like us to make a decision which of those styles to recommend for
> new transform and connectors writers. This question is coming up rather
> frequently, e.g. it came up in JdbcIO and MongoDbGridFSIO.
>
> For the purpose of this discussion, imagine a hypothetical builder class
> that looks like this:
>
> class Foo<T> {
> private Bar<T> bar;
> private int blah;
>
> Foo<T> withBlah(int blah);
> }
>
> So far I've seen several styles of binding the type argument in a withBar()
> method vs. a creation method:
>
> 1. Binding at the creation method: e.g.:
>
> class Foo<T> {
> ...
> public static <T> Foo<T> create();
> public FooBuilder<T> withBar(Bar<T> bar);
> }
>
> Foo<String> foo = Foo.<String>create().withBlah(42).withBar(new
> StringBar());
>
> Example: GroupByKey does this. As well as other transforms that don't have
> a withBar()-like method, but still need a type argument, e.g. Keys.
>
> Pros: completely unambiguous, easy to code, interacts well with @AutoValue
> Cons: need to specify type once at call site.
>
> 2. Binding at a method that takes an argument of the given type (let us
> call it "a constraint argument"), e.g.:
>
> class Foo<T> {
> ...
> public static FooBuilder<?> create();
> public <U> FooBuilder<U> withBar(Bar<U> bar);
> }
>
> Foo<String> foo = Foo.create().withBlah(42).withBar(new StringBar());
>
> Example: KafkaIO
> https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L280
>
> Pros: don't need to specify type at call site.
> Cons: doesn't interact well with @AutoValue (it doesn't support builder
> methods that change type) - requires unchecked conversions.
>
> 3. Forcing to provide a "constraint argument" in the creation method:
>
> class Foo<T> {
> ...
> public static <T> FooBuilder<T> create(Bar<T> bar);
> // (do not provide withBar)
> }
>
> Foo<String> foo = Foo.create(new StringBar()).withBlah(42);
>
> Example: WithKeys
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithKeys.java,
> ParDo
>
> Pros: easy to code, interacts ok with @AutoValue, don't need to specify
> type at call site.
> Cons: need to supply all constraint arguments in the create method, so they
> are treated differently from other arguments.
>
> 4. Splitting the builder into a "bound" and "unbound" class:
>
> class Foo {
> Unbound create();
>
> class Unbound {
> public Unbound withBlah(int blah);
> public <T> Bound<T> withBar(Bar<T> bar);
> }
>
> class Bound<T> {
> public Bound<T> withBlah(int blah);
> }
> }
>
> Foo.Bound<String> foo = Foo.create().withBlah(42).withBar(new StringBar());
>
> Example: TextIO.Read
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
> Pros: even more type-safe at call site than the others (using an incomplete
> builder is a compile error)
> Cons: very cumbersome to implement, lots of confusion between "(un)bound"
> and "(un)bounded", tempting for clients to use ugly variable names such as
> "Foo.Bound<String> bound = ..." (rather than "foo")
>
> ****
>
> I'd like to argue in favor of #1, because:
> - It makes sense for transforms like Keys.create() which do not have a
> "constraint argument", so we can have consistency between such transforms
> and the others.
> - It is the simplest to implement, and causes the fewest amount of
> generics-related confusion when reading the implementation code.
> - It interacts well with @AutoValue builders.
>
> The only downside is that you have to specify the generic argument at call
> site, but I think this is acceptable given the benefits of consistency,
> unambiguity and providing a pattern that's easy to follow for future
> transform writers.
>
> Of course, there should be an exception for cases when there is a very
> small and fixed number of arguments, or when it's clear that the
> "constraint argument" is the most important one - e.g. ParDo.of(DoFn<A, B>)
> should *not* be changed to ParDo.<A, B>create().withFn(DoFn<A, B>). Also,
> I'm not suggesting making changes to existing transforms, only deciding
> which pattern to recommend for new transforms.
>
> WDYT?

Re: Specifying type arguments for generic PTransform builders

Reply via email to