In my original email, all FooBuilder's should be simply Foo. Sorry for the confusion.
On Thu, Oct 6, 2016 at 3:08 PM Kenneth Knowles <[email protected]> wrote: > Mostly my thoughts are the same as Robert's. Use #3 whenever possible, > fallback to #1 otherwise, but please consider using informative names for > your methods in all cases. > > #1 GBK.<T>create(): IMO this pattern is best only for transforms where > withBar is optional or there is no such method, as in GBK. If it is > mandatory, it should just be required in the first method, eliding the > issue, as in ParDo.of(DoFn<I, O>), MapElements.via(...), etc, like you say > in your concluding remark. > > #2 FooBuilder<?> FooBuilder.create(): this too - if you are going to fix > the type, fix it first. If it is optional and Foo<?> is usable as a > transform, then sure. (it would have be something weird like Foo<InputT, > OutputT, ?> extends PTransform<InputT, OutputT>) > > #3 Foo.create(Bar<T>): this is best. Do this whenever possible. From my > perspective, instead of "move the param to create(...)" I would describe > this as "delete create() then rename withBar to create". Just skip the > second step and you are in an even better design, withBar being the > starting point. Just like ParDo.of and MapElements.via. > > #4 Dislike this, too, for the same reasons as #2 plus code bloat plus user > confusion. > > Side note since you use this method in all your examples: This kind of use > of "create" is a bad method name. There may be no new object "created". > Sometimes we have no better idea, but create() is a poor default. For GBK > both are bad: create() (we really only need one instance so why create > anything?) and create(<boolean>) (what is the unlabeled boolean?). They > would be improved by GBK.standard() and GBK.fewKeys() or some such. I tend > to think that focusing on this fine polish eliminates a lot of cases for > the generalized question. > > Kenn > > On Thu, Oct 6, 2016 at 2:10 PM Eugene Kirpichov > <[email protected]> wrote: > > > Quite a few transforms in the SDK are generic (i.e. have type > parameters), > > e.g. ParDo, GroupByKey, Keys / WithKeys, many connectors (TextIO, > KafkaIO, > > JdbcIO, MongoDbGridFSIO etc - both read and write). They use different > > styles of binding the type parameters to concrete types in caller code. > > > > I would like us to make a decision which of those styles to recommend for > > new transform and connectors writers. This question is coming up rather > > frequently, e.g. it came up in JdbcIO and MongoDbGridFSIO. > > > > For the purpose of this discussion, imagine a hypothetical builder class > > that looks like this: > > > > class Foo<T> { > > private Bar<T> bar; > > private int blah; > > > > Foo<T> withBlah(int blah); > > } > > > > So far I've seen several styles of binding the type argument in a > withBar() > > method vs. a creation method: > > > > 1. Binding at the creation method: e.g.: > > > > class Foo<T> { > > ... > > public static <T> Foo<T> create(); > > public FooBuilder<T> withBar(Bar<T> bar); > > } > > > > Foo<String> foo = Foo.<String>create().withBlah(42).withBar(new > > StringBar()); > > > > Example: GroupByKey does this. As well as other transforms that don't > have > > a withBar()-like method, but still need a type argument, e.g. Keys. > > > > Pros: completely unambiguous, easy to code, interacts well with > @AutoValue > > Cons: need to specify type once at call site. > > > > 2. Binding at a method that takes an argument of the given type (let us > > call it "a constraint argument"), e.g.: > > > > class Foo<T> { > > ... > > public static FooBuilder<?> create(); > > public <U> FooBuilder<U> withBar(Bar<U> bar); > > } > > > > Foo<String> foo = Foo.create().withBlah(42).withBar(new StringBar()); > > > > Example: KafkaIO > > > > > https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L280 > > > > Pros: don't need to specify type at call site. > > Cons: doesn't interact well with @AutoValue (it doesn't support builder > > methods that change type) - requires unchecked conversions. > > > > 3. Forcing to provide a "constraint argument" in the creation method: > > > > class Foo<T> { > > ... > > public static <T> FooBuilder<T> create(Bar<T> bar); > > // (do not provide withBar) > > } > > > > Foo<String> foo = Foo.create(new StringBar()).withBlah(42); > > > > Example: WithKeys > > > > > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithKeys.java > > , > > ParDo > > > > Pros: easy to code, interacts ok with @AutoValue, don't need to specify > > type at call site. > > Cons: need to supply all constraint arguments in the create method, so > they > > are treated differently from other arguments. > > > > 4. Splitting the builder into a "bound" and "unbound" class: > > > > class Foo { > > Unbound create(); > > > > class Unbound { > > public Unbound withBlah(int blah); > > public <T> Bound<T> withBar(Bar<T> bar); > > } > > > > class Bound<T> { > > public Bound<T> withBlah(int blah); > > } > > } > > > > Foo.Bound<String> foo = Foo.create().withBlah(42).withBar(new > StringBar()); > > > > Example: TextIO.Read > > > > > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java > > Pros: even more type-safe at call site than the others (using an > incomplete > > builder is a compile error) > > Cons: very cumbersome to implement, lots of confusion between "(un)bound" > > and "(un)bounded", tempting for clients to use ugly variable names such > as > > "Foo.Bound<String> bound = ..." (rather than "foo") > > > > **** > > > > I'd like to argue in favor of #1, because: > > - It makes sense for transforms like Keys.create() which do not have a > > "constraint argument", so we can have consistency between such transforms > > and the others. > > - It is the simplest to implement, and causes the fewest amount of > > generics-related confusion when reading the implementation code. > > - It interacts well with @AutoValue builders. > > > > The only downside is that you have to specify the generic argument at > call > > site, but I think this is acceptable given the benefits of consistency, > > unambiguity and providing a pattern that's easy to follow for future > > transform writers. > > > > Of course, there should be an exception for cases when there is a very > > small and fixed number of arguments, or when it's clear that the > > "constraint argument" is the most important one - e.g. ParDo.of(DoFn<A, > B>) > > should *not* be changed to ParDo.<A, B>create().withFn(DoFn<A, B>). Also, > > I'm not suggesting making changes to existing transforms, only deciding > > which pattern to recommend for new transforms. > > > > WDYT? > > >
