This is also a good reason to avoid overly general names like "from",
"create" and "of". Instead, the option should be ".fromQuery(String
query)", so we can add ".fromTable(...)".

On Thu, Oct 13, 2016 at 4:55 PM Dan Halperin <dhalp...@google.com.invalid>
wrote:

> For #3 -- I think we should be VERY careful there. You need to be
> absolutely certain that there will never, ever be another alternative to
> your mandatory argument. For example, you build an option to read from a
> DB, so you supply a .from(String query). Then later, you want to add
> reading just a table directly, so you add fromTable(Table). In this case,
> it's much better to use .read().fromQuery() or .read().fromTable() --
> having ".read()" be the "standard builder a'la #1".
>
> The other major drawback of #3 is the inability to provide generic
> configuration. For example, a utility function that gives you a database
> reader ready-to-go with all the default credentials and options you need --
> but independent of the type you want the result to end up as. You can't do
> that if you must bind the type first.
>
> I think that in general all of these patterns are significantly worse in
> the long run than the existing standards, e.g., for BigtableIO. These
> suggestions are motivated by making things easier on transform writers, but
> IMO we need to be optimizing for transform users.
>
> On Fri, Oct 7, 2016 at 4:48 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
> > In my original email, all FooBuilder's should be simply Foo. Sorry for
> the
> > confusion.
> >
> > On Thu, Oct 6, 2016 at 3:08 PM Kenneth Knowles <k...@google.com.invalid>
> > wrote:
> >
> > > Mostly my thoughts are the same as Robert's. Use #3 whenever possible,
> > > fallback to #1 otherwise, but please consider using informative names
> for
> > > your methods in all cases.
> > >
> > > #1 GBK.<T>create(): IMO this pattern is best only for transforms where
> > > withBar is optional or there is no such method, as in GBK. If it is
> > > mandatory, it should just be required in the first method, eliding the
> > > issue, as in ParDo.of(DoFn<I, O>), MapElements.via(...), etc, like you
> > say
> > > in your concluding remark.
> > >
> > > #2 FooBuilder<?> FooBuilder.create(): this too - if you are going to
> fix
> > > the type, fix it first. If it is optional and Foo<?> is usable as a
> > > transform, then sure. (it would have be something weird like
> Foo<InputT,
> > > OutputT, ?> extends PTransform<InputT, OutputT>)
> > >
> > > #3 Foo.create(Bar<T>): this is best. Do this whenever possible. From my
> > > perspective, instead of "move the param to create(...)" I would
> describe
> > > this as "delete create() then rename withBar to create". Just skip the
> > > second step and you are in an even better design, withBar being the
> > > starting point. Just like ParDo.of and MapElements.via.
> > >
> > > #4 Dislike this, too, for the same reasons as #2 plus code bloat plus
> > user
> > > confusion.
> > >
> > > Side note since you use this method in all your examples: This kind of
> > use
> > > of "create" is a bad method name. There may be no new object "created".
> > > Sometimes we have no better idea, but create() is a poor default. For
> GBK
> > > both are bad: create() (we really only need one instance so why create
> > > anything?) and create(<boolean>) (what is the unlabeled boolean?). They
> > > would be improved by GBK.standard() and GBK.fewKeys() or some such. I
> > tend
> > > to think that focusing on this fine polish eliminates a lot of cases
> for
> > > the generalized question.
> > >
> > > Kenn
> > >
> > > On Thu, Oct 6, 2016 at 2:10 PM Eugene Kirpichov
> > > <kirpic...@google.com.invalid> wrote:
> > >
> > > > Quite a few transforms in the SDK are generic (i.e. have type
> > > parameters),
> > > > e.g. ParDo, GroupByKey, Keys / WithKeys, many connectors (TextIO,
> > > KafkaIO,
> > > > JdbcIO, MongoDbGridFSIO etc - both read and write). They use
> different
> > > > styles of binding the type parameters to concrete types in caller
> code.
> > > >
> > > > I would like us to make a decision which of those styles to recommend
> > for
> > > > new transform and connectors writers. This question is coming up
> rather
> > > > frequently, e.g. it came up in JdbcIO and MongoDbGridFSIO.
> > > >
> > > > For the purpose of this discussion, imagine a hypothetical builder
> > class
> > > > that looks like this:
> > > >
> > > > class Foo<T> {
> > > > private Bar<T> bar;
> > > > private int blah;
> > > >
> > > > Foo<T> withBlah(int blah);
> > > > }
> > > >
> > > > So far I've seen several styles of binding the type argument in a
> > > withBar()
> > > > method vs. a creation method:
> > > >
> > > > 1. Binding at the creation method: e.g.:
> > > >
> > > > class Foo<T> {
> > > > ...
> > > > public static <T> Foo<T> create();
> > > > public FooBuilder<T> withBar(Bar<T> bar);
> > > > }
> > > >
> > > > Foo<String> foo = Foo.<String>create().withBlah(42).withBar(new
> > > > StringBar());
> > > >
> > > > Example: GroupByKey does this. As well as other transforms that don't
> > > have
> > > > a withBar()-like method, but still need a type argument, e.g. Keys.
> > > >
> > > > Pros: completely unambiguous, easy to code, interacts well with
> > > @AutoValue
> > > > Cons: need to specify type once at call site.
> > > >
> > > > 2. Binding at a method that takes an argument of the given type (let
> us
> > > > call it "a constraint argument"), e.g.:
> > > >
> > > > class Foo<T> {
> > > > ...
> > > > public static FooBuilder<?> create();
> > > > public <U> FooBuilder<U> withBar(Bar<U> bar);
> > > > }
> > > >
> > > > Foo<String> foo = Foo.create().withBlah(42).withBar(new StringBar());
> > > >
> > > > Example: KafkaIO
> > > >
> > > >
> > > https://github.com/apache/incubator-beam/blob/master/sdks/
> >
> java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L280
> > > >
> > > > Pros: don't need to specify type at call site.
> > > > Cons: doesn't interact well with @AutoValue (it doesn't support
> builder
> > > > methods that change type) - requires unchecked conversions.
> > > >
> > > > 3. Forcing to provide a "constraint argument" in the creation method:
> > > >
> > > > class Foo<T> {
> > > > ...
> > > > public static <T> FooBuilder<T> create(Bar<T> bar);
> > > > // (do not provide withBar)
> > > > }
> > > >
> > > > Foo<String> foo = Foo.create(new StringBar()).withBlah(42);
> > > >
> > > > Example: WithKeys
> > > >
> > > >
> > > https://github.com/apache/incubator-beam/blob/master/sdks/
> > java/core/src/main/java/org/apache/beam/sdk/transforms/WithKeys.java
> > > > ,
> > > > ParDo
> > > >
> > > > Pros: easy to code, interacts ok with @AutoValue, don't need to
> specify
> > > > type at call site.
> > > > Cons: need to supply all constraint arguments in the create method,
> so
> > > they
> > > > are treated differently from other arguments.
> > > >
> > > > 4. Splitting the builder into a "bound" and "unbound" class:
> > > >
> > > > class Foo {
> > > > Unbound create();
> > > >
> > > > class Unbound {
> > > > public Unbound withBlah(int blah);
> > > > public <T> Bound<T> withBar(Bar<T> bar);
> > > > }
> > > >
> > > > class Bound<T> {
> > > > public Bound<T> withBlah(int blah);
> > > > }
> > > > }
> > > >
> > > > Foo.Bound<String> foo = Foo.create().withBlah(42).withBar(new
> > > StringBar());
> > > >
> > > > Example: TextIO.Read
> > > >
> > > >
> > > https://github.com/apache/incubator-beam/blob/master/sdks/
> > java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
> > > > Pros: even more type-safe at call site than the others (using an
> > > incomplete
> > > > builder is a compile error)
> > > > Cons: very cumbersome to implement, lots of confusion between
> > "(un)bound"
> > > > and "(un)bounded", tempting for clients to use ugly variable names
> such
> > > as
> > > > "Foo.Bound<String> bound = ..." (rather than "foo")
> > > >
> > > > ****
> > > >
> > > > I'd like to argue in favor of #1, because:
> > > > - It makes sense for transforms like Keys.create() which do not have
> a
> > > > "constraint argument", so we can have consistency between such
> > transforms
> > > > and the others.
> > > > - It is the simplest to implement, and causes the fewest amount of
> > > > generics-related confusion when reading the implementation code.
> > > > - It interacts well with @AutoValue builders.
> > > >
> > > > The only downside is that you have to specify the generic argument at
> > > call
> > > > site, but I think this is acceptable given the benefits of
> consistency,
> > > > unambiguity and providing a pattern that's easy to follow for future
> > > > transform writers.
> > > >
> > > > Of course, there should be an exception for cases when there is a
> very
> > > > small and fixed number of arguments, or when it's clear that the
> > > > "constraint argument" is the most important one - e.g.
> ParDo.of(DoFn<A,
> > > B>)
> > > > should *not* be changed to ParDo.<A, B>create().withFn(DoFn<A, B>).
> > Also,
> > > > I'm not suggesting making changes to existing transforms, only
> deciding
> > > > which pattern to recommend for new transforms.
> > > >
> > > > WDYT?
> > > >
> > >
> >
>

Reply via email to