[GitHub] [kafka] vvcephei commented on issue #5619: Add a scala-friendlier KStream.transform() variant

GitHub Tue, 11 Sep 2018 13:20:43 -0700

Thanks for considering the alternative so thoroughly! And thanks for that 
reference. I can certainly get behind aggressively simple API design instead of 
flexible API design.


For background (as you noted), the reason that this one method is different is 
just that the implementation of it was bugged by taking a Transformer instead 
of some kind of supplier. We needed to either replace it with a 0-arity 
function or the TransformerSupplier interface.

I was actually the one arguing to use the Supplier interface at the time, and I 
was hoping over time to deprecate *all* the function-supplier methods and make 
them *all* take the appropriate Supplier. However, I feel very precarious in 
this opinion, so I welcome the opportunity to discuss it with you.

### For TransformerSupplier:

My basic reservation is about programmability and self-documentation.

The interfaces themselves give us an opportunity to annotate the argument with 
its purpose. This is both via the type itself, and also via the opportunity to 
attach documentation to the interface and its method. I view the 
TrandformerSupplier as a specific type of function.

For example, the Java API itself could have come with a `Function0` interface 
instead of having separate interfaces for TransformerSupplier, 
ProcessorSupplier, etc. But thinking about how this would look, it just doesn't 
feel right, even though it's functionally (hah!) equivalent.

Often, when I'm programming against APIs like this, I'll actually start typing 
`.transform(new TransformerSupplier<tab>` and let the intellisense completion 
stub out everything with the right parameter and return types. After I fill it 
in, then I accept the prompt to convert the interfaces into functions whenever 
possible. If we just present an API of raw functions, I don't think you get 
as-nice autocomplete behavior. This is a handy little productivity bump because 
the consulting the IDE prompts is much more efficient than keeping the 
documentation open to the side and scrolling around to find each component I'm 
trying to use.

A reduce-to-absurdity parody of this is that the Transformer itself is 
basically a function also. So why not have the API be: `def 
transform[K,V,K1,V1](() => (K, V) => (K1, V1)): KStream[K1, V1]`? It's very 
parsimonious, in that you don't need any extra types over what comes with 
Scala, but it also provides limited opportunity to provide extra information 
about what the two levels are for, what's going to happen with the result, etc. 
This approach would become super ridiculous with aggregate, where we have three 
different arguments that would just be different combinations of different 
arity functions.

So we back off from that and replace the inner function with the Transformer 
interface. But my question is, why stop there? Doesn't the same argument 
suggest that we should use the TransformerSupplier interface at the top level? 
That way there's some place to put documentation like, why do we need a 
supplier instead of just an instance, what are the situations in which we'll 
create a new Transformer vs. caching the instance, etc.

The fact that 2.12+ supports SAM conversion is actually an argument in favor of 
this approach at least. Now, we have the best of both worlds, the API can be 
self-documenting and strongly typed, but the user code can still be 
parsimonious (by just passing functions instead of anonymous subclasses). You'd 
only need the implicits in scope if you're on 2.11.

### On to KeyValue...

I think `KeyValue` vs. `(k,v)` is actually an orthogonal decision. It's 
possible to take or leave the Supplier interface and take or leave the KeyValue 
type independently. I think the purpose for the Java type in this case is 
purely the lack of a 2-tuple type in the standard collections. So I think it 
might be reasonable to just be opinionated and use the Scala tuples throughout 
the interface. 

My only argument in favor of using KeyValue is that I suspect that there are a 
healthy contingent of folks out there using the Java API from Scala, who would 
like to swap over to Scala to gain better typing, implicit serdes, etc. without 
having to rewrite all their code. Is this a strong argument? I don't know. I 
guess that if I already have a bunch of Transformers returning KeyValue, and I 
need them to be `(K,V)` instead, I can write a one-liner implicit def from 
`(K,V)` to `KeyValue<K,V>` at the top of my class and be done with it.

### the end ;)

So what do you think about this? Are these strong arguments, or am I just 
over-engineering?


[ Full content available at: https://github.com/apache/kafka/pull/5619 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [kafka] vvcephei commented on issue #5619: Add a scala-friendlier KStream.transform() variant

Reply via email to