correction, ive tested it in hive. Not in MapReduce.

On Mon, Aug 26, 2013 at 3:04 PM, Erik Thorson <[email protected]> wrote:
> Hello all,
>
> I have a list of log entries that I want to partition by a key. I Map
> the list to RDD of String,String. I then find the number of unique
> keys and use that to determine the number of partitions. I use
> RDD.PartitionBy(new HashPartitioner(# of partitions)). When I look at
> the results man of the partitions are empty, while others have keys in
> them that should be excluded. Any idea of why this is? I have also
> tried it with the RangePartitioner. Same result. Some of the
> partitions will be very small meaning that some will only have 5
> entries while others have millions (if this helps). I have tried
> running the same program in Cloudera's mapReduce and hive and it seems
> to work on that platform. Is there something I'm missing?
>
>
> Thanks,
>
> Erik

Reply via email to