For #3 -- I think we should be VERY careful there. You need to be
absolutely certain that there will never, ever be another alternative to
your mandatory argument. For example, you build an option to read from a
DB, so you supply a .from(String query). Then later, you want to add
reading just a table directly, so you add fromTable(Table). In this case,
it's much better to use .read().fromQuery() or .read().fromTable() --
having ".read()" be the "standard builder a'la #1".

The other major drawback of #3 is the inability to provide generic
configuration. For example, a utility function that gives you a database
reader ready-to-go with all the default credentials and options you need --
but independent of the type you want the result to end up as. You can't do
that if you must bind the type first.

I think that in general all of these patterns are significantly worse in
the long run than the existing standards, e.g., for BigtableIO. These
suggestions are motivated by making things easier on transform writers, but
IMO we need to be optimizing for transform users.

On Fri, Oct 7, 2016 at 4:48 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

> In my original email, all FooBuilder's should be simply Foo. Sorry for the
> confusion.
>
> On Thu, Oct 6, 2016 at 3:08 PM Kenneth Knowles <k...@google.com.invalid>
> wrote:
>
> > Mostly my thoughts are the same as Robert's. Use #3 whenever possible,
> > fallback to #1 otherwise, but please consider using informative names for
> > your methods in all cases.
> >
> > #1 GBK.<T>create(): IMO this pattern is best only for transforms where
> > withBar is optional or there is no such method, as in GBK. If it is
> > mandatory, it should just be required in the first method, eliding the
> > issue, as in ParDo.of(DoFn<I, O>), MapElements.via(...), etc, like you
> say
> > in your concluding remark.
> >
> > #2 FooBuilder<?> FooBuilder.create(): this too - if you are going to fix
> > the type, fix it first. If it is optional and Foo<?> is usable as a
> > transform, then sure. (it would have be something weird like Foo<InputT,
> > OutputT, ?> extends PTransform<InputT, OutputT>)
> >
> > #3 Foo.create(Bar<T>): this is best. Do this whenever possible. From my
> > perspective, instead of "move the param to create(...)" I would describe
> > this as "delete create() then rename withBar to create". Just skip the
> > second step and you are in an even better design, withBar being the
> > starting point. Just like ParDo.of and MapElements.via.
> >
> > #4 Dislike this, too, for the same reasons as #2 plus code bloat plus
> user
> > confusion.
> >
> > Side note since you use this method in all your examples: This kind of
> use
> > of "create" is a bad method name. There may be no new object "created".
> > Sometimes we have no better idea, but create() is a poor default. For GBK
> > both are bad: create() (we really only need one instance so why create
> > anything?) and create(<boolean>) (what is the unlabeled boolean?). They
> > would be improved by GBK.standard() and GBK.fewKeys() or some such. I
> tend
> > to think that focusing on this fine polish eliminates a lot of cases for
> > the generalized question.
> >
> > Kenn
> >
> > On Thu, Oct 6, 2016 at 2:10 PM Eugene Kirpichov
> > <kirpic...@google.com.invalid> wrote:
> >
> > > Quite a few transforms in the SDK are generic (i.e. have type
> > parameters),
> > > e.g. ParDo, GroupByKey, Keys / WithKeys, many connectors (TextIO,
> > KafkaIO,
> > > JdbcIO, MongoDbGridFSIO etc - both read and write). They use different
> > > styles of binding the type parameters to concrete types in caller code.
> > >
> > > I would like us to make a decision which of those styles to recommend
> for
> > > new transform and connectors writers. This question is coming up rather
> > > frequently, e.g. it came up in JdbcIO and MongoDbGridFSIO.
> > >
> > > For the purpose of this discussion, imagine a hypothetical builder
> class
> > > that looks like this:
> > >
> > > class Foo<T> {
> > > private Bar<T> bar;
> > > private int blah;
> > >
> > > Foo<T> withBlah(int blah);
> > > }
> > >
> > > So far I've seen several styles of binding the type argument in a
> > withBar()
> > > method vs. a creation method:
> > >
> > > 1. Binding at the creation method: e.g.:
> > >
> > > class Foo<T> {
> > > ...
> > > public static <T> Foo<T> create();
> > > public FooBuilder<T> withBar(Bar<T> bar);
> > > }
> > >
> > > Foo<String> foo = Foo.<String>create().withBlah(42).withBar(new
> > > StringBar());
> > >
> > > Example: GroupByKey does this. As well as other transforms that don't
> > have
> > > a withBar()-like method, but still need a type argument, e.g. Keys.
> > >
> > > Pros: completely unambiguous, easy to code, interacts well with
> > @AutoValue
> > > Cons: need to specify type once at call site.
> > >
> > > 2. Binding at a method that takes an argument of the given type (let us
> > > call it "a constraint argument"), e.g.:
> > >
> > > class Foo<T> {
> > > ...
> > > public static FooBuilder<?> create();
> > > public <U> FooBuilder<U> withBar(Bar<U> bar);
> > > }
> > >
> > > Foo<String> foo = Foo.create().withBlah(42).withBar(new StringBar());
> > >
> > > Example: KafkaIO
> > >
> > >
> > https://github.com/apache/incubator-beam/blob/master/sdks/
> java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L280
> > >
> > > Pros: don't need to specify type at call site.
> > > Cons: doesn't interact well with @AutoValue (it doesn't support builder
> > > methods that change type) - requires unchecked conversions.
> > >
> > > 3. Forcing to provide a "constraint argument" in the creation method:
> > >
> > > class Foo<T> {
> > > ...
> > > public static <T> FooBuilder<T> create(Bar<T> bar);
> > > // (do not provide withBar)
> > > }
> > >
> > > Foo<String> foo = Foo.create(new StringBar()).withBlah(42);
> > >
> > > Example: WithKeys
> > >
> > >
> > https://github.com/apache/incubator-beam/blob/master/sdks/
> java/core/src/main/java/org/apache/beam/sdk/transforms/WithKeys.java
> > > ,
> > > ParDo
> > >
> > > Pros: easy to code, interacts ok with @AutoValue, don't need to specify
> > > type at call site.
> > > Cons: need to supply all constraint arguments in the create method, so
> > they
> > > are treated differently from other arguments.
> > >
> > > 4. Splitting the builder into a "bound" and "unbound" class:
> > >
> > > class Foo {
> > > Unbound create();
> > >
> > > class Unbound {
> > > public Unbound withBlah(int blah);
> > > public <T> Bound<T> withBar(Bar<T> bar);
> > > }
> > >
> > > class Bound<T> {
> > > public Bound<T> withBlah(int blah);
> > > }
> > > }
> > >
> > > Foo.Bound<String> foo = Foo.create().withBlah(42).withBar(new
> > StringBar());
> > >
> > > Example: TextIO.Read
> > >
> > >
> > https://github.com/apache/incubator-beam/blob/master/sdks/
> java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
> > > Pros: even more type-safe at call site than the others (using an
> > incomplete
> > > builder is a compile error)
> > > Cons: very cumbersome to implement, lots of confusion between
> "(un)bound"
> > > and "(un)bounded", tempting for clients to use ugly variable names such
> > as
> > > "Foo.Bound<String> bound = ..." (rather than "foo")
> > >
> > > ****
> > >
> > > I'd like to argue in favor of #1, because:
> > > - It makes sense for transforms like Keys.create() which do not have a
> > > "constraint argument", so we can have consistency between such
> transforms
> > > and the others.
> > > - It is the simplest to implement, and causes the fewest amount of
> > > generics-related confusion when reading the implementation code.
> > > - It interacts well with @AutoValue builders.
> > >
> > > The only downside is that you have to specify the generic argument at
> > call
> > > site, but I think this is acceptable given the benefits of consistency,
> > > unambiguity and providing a pattern that's easy to follow for future
> > > transform writers.
> > >
> > > Of course, there should be an exception for cases when there is a very
> > > small and fixed number of arguments, or when it's clear that the
> > > "constraint argument" is the most important one - e.g. ParDo.of(DoFn<A,
> > B>)
> > > should *not* be changed to ParDo.<A, B>create().withFn(DoFn<A, B>).
> Also,
> > > I'm not suggesting making changes to existing transforms, only deciding
> > > which pattern to recommend for new transforms.
> > >
> > > WDYT?
> > >
> >
>

Reply via email to