Re: Filtering keys after map+combine

2015-02-19 Thread Sean Owen
You have the keys before and after reduceByKey. You want to do something based on the key within reduceByKey? it just calls combineByKey, so you can use that method for lower-level control over the merging. Whether it's possible depends I suppose on what you mean to filter on. If it's just a

Re: Filtering keys after map+combine

2015-02-19 Thread Debasish Das
Hi Sean, This is what I intend to do: are you saying that you know a key should be filtered based on its value partway through the merge? I should use combineByKey... Thanks. Deb On Thu, Feb 19, 2015 at 6:31 AM, Sean Owen so...@cloudera.com wrote: You have the keys before and after

Re: Filtering keys after map+combine

2015-02-19 Thread Daniel Siegmann
I'm not sure what your use case is, but perhaps you could use mapPartitions to reduce across the individual partitions and apply your filtering. Then you can finish with a reduceByKey. On Thu, Feb 19, 2015 at 9:21 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, Before I send out the keys

Filtering keys after map+combine

2015-02-19 Thread Debasish Das
Hi, Before I send out the keys for network shuffle, in reduceByKey after map + combine are done, I would like to filter the keys based on some threshold... Is there a way to get the key, value after map+combine stages so that I can run a filter on the keys ? Thanks. Deb

Re: Filtering keys after map+combine

2015-02-19 Thread Debasish Das
I thought combiner comes from reduceByKey and not mapPartitions right...Let me dig deeper into the APIs On Thu, Feb 19, 2015 at 8:29 AM, Daniel Siegmann daniel.siegm...@velos.io wrote: I'm not sure what your use case is, but perhaps you could use mapPartitions to reduce across the individual