[ 
https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146657#comment-15146657
 ] 

Daniel Halperin commented on BEAM-12:
-------------------------------------

Hi,

I'm assuming that `ExtractFn` is some sort of function that turns `T` -> `KV<K, 
V>`.

Can you clarify what's awkward about Frances' idea of using a `ParDo`, or the 
more-Java-8-y

```java
p.apply(MapElements.via(ExtractFn())) // or .via(lambda T: 
KV).withOutputTypeDescriptor(K,V)
   .apply(GroupByKey.create())
```

to do the first step?

In general, we prefer modularity in the SDK. We want you to be able to make 
little reusable bits of code here and there. Also, given that there is already 
a "1-liner" to turn a `T` into ` KV` using a `lambda`, we don't really want to 
add a second way. It only complicates SDKs to have many ways of doing the exact 
same thing.

Note that if we added such a shortcut to `GroupByKey`, we really ought to also 
add it to `CoGroupByKey` and `Combine.PerKey`? The latter two functions have 
significantly more complicated semantics than GBK, and they may take a non-zero 
number of arguments. So either we "double" the number of ways to construct 
these methods and users also have to worry about parameter order, or we provide 
an inconsistent API surface -- neither of which is IMO good for our users -- or 
we stick with the behavior now that focuses on modularity.

I'd re-emphasize Frances' point: anywhere the extra 1-liner seems to complicate 
your code, you can add a composite PTransform that does exactly what you want: 
wrap GBK with your ExtractFn(), and use it that way.

Thanks!

> Apply GroupByKey transforms on PCollection of normal type other than KV
> -----------------------------------------------------------------------
>
>                 Key: BEAM-12
>                 URL: https://issues.apache.org/jira/browse/BEAM-12
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: bakeypan
>            Assignee: Frances Perry
>            Priority: Trivial
>
> Now the GroupByKey transforms can only apply on PCollection<KV<K,V>>.So I 
> have to transform PCollection<T> to PCollection<KV<K,V>> before I want to 
> apply GroupByKey.
> I think we can do better by apply GroupByKey on normal type of PCollection 
> other than KV.And user can offer one custome extract key function or we can 
> offer default extract key function.Just like this:
> PCollection<T> input = ...
> PCollection<KV<K,Iterable<V>>> result = input.apply(GroupByKey.<K, 
> V>create(new ExtractFn()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to