This is also a good reason to avoid overly general names like "from", "create" and "of". Instead, the option should be ".fromQuery(String query)", so we can add ".fromTable(...)".
On Thu, Oct 13, 2016 at 4:55 PM Dan Halperin <dhalp...@google.com.invalid> wrote: > For #3 -- I think we should be VERY careful there. You need to be > absolutely certain that there will never, ever be another alternative to > your mandatory argument. For example, you build an option to read from a > DB, so you supply a .from(String query). Then later, you want to add > reading just a table directly, so you add fromTable(Table). In this case, > it's much better to use .read().fromQuery() or .read().fromTable() -- > having ".read()" be the "standard builder a'la #1". > > The other major drawback of #3 is the inability to provide generic > configuration. For example, a utility function that gives you a database > reader ready-to-go with all the default credentials and options you need -- > but independent of the type you want the result to end up as. You can't do > that if you must bind the type first. > > I think that in general all of these patterns are significantly worse in > the long run than the existing standards, e.g., for BigtableIO. These > suggestions are motivated by making things easier on transform writers, but > IMO we need to be optimizing for transform users. > > On Fri, Oct 7, 2016 at 4:48 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > In my original email, all FooBuilder's should be simply Foo. Sorry for > the > > confusion. > > > > On Thu, Oct 6, 2016 at 3:08 PM Kenneth Knowles <k...@google.com.invalid> > > wrote: > > > > > Mostly my thoughts are the same as Robert's. Use #3 whenever possible, > > > fallback to #1 otherwise, but please consider using informative names > for > > > your methods in all cases. > > > > > > #1 GBK.<T>create(): IMO this pattern is best only for transforms where > > > withBar is optional or there is no such method, as in GBK. If it is > > > mandatory, it should just be required in the first method, eliding the > > > issue, as in ParDo.of(DoFn<I, O>), MapElements.via(...), etc, like you > > say > > > in your concluding remark. > > > > > > #2 FooBuilder<?> FooBuilder.create(): this too - if you are going to > fix > > > the type, fix it first. If it is optional and Foo<?> is usable as a > > > transform, then sure. (it would have be something weird like > Foo<InputT, > > > OutputT, ?> extends PTransform<InputT, OutputT>) > > > > > > #3 Foo.create(Bar<T>): this is best. Do this whenever possible. From my > > > perspective, instead of "move the param to create(...)" I would > describe > > > this as "delete create() then rename withBar to create". Just skip the > > > second step and you are in an even better design, withBar being the > > > starting point. Just like ParDo.of and MapElements.via. > > > > > > #4 Dislike this, too, for the same reasons as #2 plus code bloat plus > > user > > > confusion. > > > > > > Side note since you use this method in all your examples: This kind of > > use > > > of "create" is a bad method name. There may be no new object "created". > > > Sometimes we have no better idea, but create() is a poor default. For > GBK > > > both are bad: create() (we really only need one instance so why create > > > anything?) and create(<boolean>) (what is the unlabeled boolean?). They > > > would be improved by GBK.standard() and GBK.fewKeys() or some such. I > > tend > > > to think that focusing on this fine polish eliminates a lot of cases > for > > > the generalized question. > > > > > > Kenn > > > > > > On Thu, Oct 6, 2016 at 2:10 PM Eugene Kirpichov > > > <kirpic...@google.com.invalid> wrote: > > > > > > > Quite a few transforms in the SDK are generic (i.e. have type > > > parameters), > > > > e.g. ParDo, GroupByKey, Keys / WithKeys, many connectors (TextIO, > > > KafkaIO, > > > > JdbcIO, MongoDbGridFSIO etc - both read and write). They use > different > > > > styles of binding the type parameters to concrete types in caller > code. > > > > > > > > I would like us to make a decision which of those styles to recommend > > for > > > > new transform and connectors writers. This question is coming up > rather > > > > frequently, e.g. it came up in JdbcIO and MongoDbGridFSIO. > > > > > > > > For the purpose of this discussion, imagine a hypothetical builder > > class > > > > that looks like this: > > > > > > > > class Foo<T> { > > > > private Bar<T> bar; > > > > private int blah; > > > > > > > > Foo<T> withBlah(int blah); > > > > } > > > > > > > > So far I've seen several styles of binding the type argument in a > > > withBar() > > > > method vs. a creation method: > > > > > > > > 1. Binding at the creation method: e.g.: > > > > > > > > class Foo<T> { > > > > ... > > > > public static <T> Foo<T> create(); > > > > public FooBuilder<T> withBar(Bar<T> bar); > > > > } > > > > > > > > Foo<String> foo = Foo.<String>create().withBlah(42).withBar(new > > > > StringBar()); > > > > > > > > Example: GroupByKey does this. As well as other transforms that don't > > > have > > > > a withBar()-like method, but still need a type argument, e.g. Keys. > > > > > > > > Pros: completely unambiguous, easy to code, interacts well with > > > @AutoValue > > > > Cons: need to specify type once at call site. > > > > > > > > 2. Binding at a method that takes an argument of the given type (let > us > > > > call it "a constraint argument"), e.g.: > > > > > > > > class Foo<T> { > > > > ... > > > > public static FooBuilder<?> create(); > > > > public <U> FooBuilder<U> withBar(Bar<U> bar); > > > > } > > > > > > > > Foo<String> foo = Foo.create().withBlah(42).withBar(new StringBar()); > > > > > > > > Example: KafkaIO > > > > > > > > > > > https://github.com/apache/incubator-beam/blob/master/sdks/ > > > java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L280 > > > > > > > > Pros: don't need to specify type at call site. > > > > Cons: doesn't interact well with @AutoValue (it doesn't support > builder > > > > methods that change type) - requires unchecked conversions. > > > > > > > > 3. Forcing to provide a "constraint argument" in the creation method: > > > > > > > > class Foo<T> { > > > > ... > > > > public static <T> FooBuilder<T> create(Bar<T> bar); > > > > // (do not provide withBar) > > > > } > > > > > > > > Foo<String> foo = Foo.create(new StringBar()).withBlah(42); > > > > > > > > Example: WithKeys > > > > > > > > > > > https://github.com/apache/incubator-beam/blob/master/sdks/ > > java/core/src/main/java/org/apache/beam/sdk/transforms/WithKeys.java > > > > , > > > > ParDo > > > > > > > > Pros: easy to code, interacts ok with @AutoValue, don't need to > specify > > > > type at call site. > > > > Cons: need to supply all constraint arguments in the create method, > so > > > they > > > > are treated differently from other arguments. > > > > > > > > 4. Splitting the builder into a "bound" and "unbound" class: > > > > > > > > class Foo { > > > > Unbound create(); > > > > > > > > class Unbound { > > > > public Unbound withBlah(int blah); > > > > public <T> Bound<T> withBar(Bar<T> bar); > > > > } > > > > > > > > class Bound<T> { > > > > public Bound<T> withBlah(int blah); > > > > } > > > > } > > > > > > > > Foo.Bound<String> foo = Foo.create().withBlah(42).withBar(new > > > StringBar()); > > > > > > > > Example: TextIO.Read > > > > > > > > > > > https://github.com/apache/incubator-beam/blob/master/sdks/ > > java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java > > > > Pros: even more type-safe at call site than the others (using an > > > incomplete > > > > builder is a compile error) > > > > Cons: very cumbersome to implement, lots of confusion between > > "(un)bound" > > > > and "(un)bounded", tempting for clients to use ugly variable names > such > > > as > > > > "Foo.Bound<String> bound = ..." (rather than "foo") > > > > > > > > **** > > > > > > > > I'd like to argue in favor of #1, because: > > > > - It makes sense for transforms like Keys.create() which do not have > a > > > > "constraint argument", so we can have consistency between such > > transforms > > > > and the others. > > > > - It is the simplest to implement, and causes the fewest amount of > > > > generics-related confusion when reading the implementation code. > > > > - It interacts well with @AutoValue builders. > > > > > > > > The only downside is that you have to specify the generic argument at > > > call > > > > site, but I think this is acceptable given the benefits of > consistency, > > > > unambiguity and providing a pattern that's easy to follow for future > > > > transform writers. > > > > > > > > Of course, there should be an exception for cases when there is a > very > > > > small and fixed number of arguments, or when it's clear that the > > > > "constraint argument" is the most important one - e.g. > ParDo.of(DoFn<A, > > > B>) > > > > should *not* be changed to ParDo.<A, B>create().withFn(DoFn<A, B>). > > Also, > > > > I'm not suggesting making changes to existing transforms, only > deciding > > > > which pattern to recommend for new transforms. > > > > > > > > WDYT? > > > > > > > > > >