[ 
https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146629#comment-15146629
 ] 

bakeypan commented on BEAM-12:
------------------------------

My point is that maybe we can omit the "ParDo.of(new ExtractFn())" step for 
convenient by apply GroupByKey on the PCollection.
We just pass the ExtractFn to the GroupByKey.
For example we got PCollection<String> input,now we want to group by it by 
prefix,now we have to write like :
PCollection<KV<String,Iterable<String>>> result = input.apply(ParDo.of(new 
ExtractFn())).apply(GroupByKey.<String, String>create());
as your code before.
But if the GroupByKey can accept extract key function,we just write code like 
these:
PCollection<KV<String,Iterable<String>>> result = 
input.apply(GroupByKey.<String, String>create(new ExtractFn))
Need not transform the PCollection<"NotKVType"> to PCollection<KV<K, V>> by 
apply one more ParDo.
What do you think?


> Apply GroupByKey transforms on PCollection of normal type other than KV
> -----------------------------------------------------------------------
>
>                 Key: BEAM-12
>                 URL: https://issues.apache.org/jira/browse/BEAM-12
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: bakeypan
>            Assignee: Frances Perry
>            Priority: Trivial
>
> Now the GroupByKey transforms can only apply on PCollection<KV<K,V>>.So I 
> have to transform PCollection<T> to PCollection<KV<K,V>> before I want to 
> apply GroupByKey.
> I think we can do better by apply GroupByKey on normal type of PCollection 
> other than KV.And user can offer one custome extract key function or we can 
> offer default extract key function.Just like this:
> PCollection<T> input = ...
> PCollection<KV<K,Iterable<V>>> result = input.apply(GroupByKey.<K, 
> V>create(new ExtractFn()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to