You have the keys before and after reduceByKey. You want to do
something based on the key within reduceByKey? it just calls
combineByKey, so you can use that method for lower-level control over
the merging.
Whether it's possible depends I suppose on what you mean to filter on.
If it's just a
Hi Sean,
This is what I intend to do:
are you saying that you know a key should be filtered based on its value
partway through the merge?
I should use combineByKey...
Thanks.
Deb
On Thu, Feb 19, 2015 at 6:31 AM, Sean Owen so...@cloudera.com wrote:
You have the keys before and after
I'm not sure what your use case is, but perhaps you could use mapPartitions
to reduce across the individual partitions and apply your filtering. Then
you can finish with a reduceByKey.
On Thu, Feb 19, 2015 at 9:21 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Before I send out the keys
Hi,
Before I send out the keys for network shuffle, in reduceByKey after map +
combine are done, I would like to filter the keys based on some threshold...
Is there a way to get the key, value after map+combine stages so that I can
run a filter on the keys ?
Thanks.
Deb
I thought combiner comes from reduceByKey and not mapPartitions right...Let
me dig deeper into the APIs
On Thu, Feb 19, 2015 at 8:29 AM, Daniel Siegmann daniel.siegm...@velos.io
wrote:
I'm not sure what your use case is, but perhaps you could use
mapPartitions to reduce across the individual