A little bit more inline: On Tue, Feb 7, 2017 at 5:15 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote:
> Hello, > > I was auditing Beam for violations of PTransform style guide > https://beam.apache.org/contribute/ptransform-style-guide/ and came across > another style point that deserves discussion. > > Look at Count transform: > > public static <T> Combine.Globally<T, Long> globally() { > return Combine.globally(new CountFn<T>()); > } > > public static <K, V> Combine.PerKey<K, V, Long> perKey() { > return Combine.<K, V, Long>perKey(new CountFn<V>()); > } > > public static <T> PerElement<T> perElement() { > return new PerElement<>(); > } > > I asked myself: should globally() and perKey() also define wrapper classes > - e.g. should it be "public static <T> Globally<T, Long> globally()" where > "Globally" is a new inner class of Count? > > I argue that the answer is yes, but it's not clear-cut. > Cons: > - If we return a Combine.Globally, the user can use the features provided > by Combine.Globally - e.g. .withDefaults(), .withFanout(), > .asSingletonView(). > +1 to this point, which was a conscious decision in the pre-Beam days (which of course means it IS worth revisiting ;). Trying to replay the reasoning: * If wrapping, the author of a new Count.Globally can now only make the extra functionality in Combine available by similarly exposing all such functions. * Conversely, the current implementation makes new functionality in Combine available "for free" to users of Count.globally(). Whereas new functionality on Combine would otherwise mandate that *all wrappers* change to actually be exposed. * There's almost no data here, but: we have added new functionality to Combine (withSideInputs) and have not added new functionality to Count. > Pros: > - Style consistency with other transforms. Almost all transforms have their > own class, and their factory functions return that class. > IMO this should only happen if the user needs that class. For all examples I'm aware of, the returned class has stuff you need to do, like configuring coders or side inputs or other parameters. IMO if the user need not configure, return the least constraining thing you can. > - Implementation can evolve. However, in case of Count, that is unlikely. > +1 to "unlikely" > - ...Or is it? If the transform has a concrete class, then the runner can > intercept that class and e.g. provide an (even) more efficient > implementation of Count.Globally. This gets much more awkward if the runner > has to intercept every Combine and check whether it's combining using a > CountFn. > Post-runner API, this argument goes away. Instead, the runner will need to inspect generically the attributes of the transform to do the replacement. Summarizing: I currently am a -0.8 on this proposal. > > So, I propose to add this as a style guide rule, and solve the problem in > "Cons" by saying "Yeah, if you want the extra features of the transform > you're expanding into, you have to propagate them through the API of your > transform and delegate to the underlying transform manually". > > Thoughts? >