correction, ive tested it in hive. Not in MapReduce.
On Mon, Aug 26, 2013 at 3:04 PM, Erik Thorson <[email protected]> wrote: > Hello all, > > I have a list of log entries that I want to partition by a key. I Map > the list to RDD of String,String. I then find the number of unique > keys and use that to determine the number of partitions. I use > RDD.PartitionBy(new HashPartitioner(# of partitions)). When I look at > the results man of the partitions are empty, while others have keys in > them that should be excluded. Any idea of why this is? I have also > tried it with the RangePartitioner. Same result. Some of the > partitions will be very small meaning that some will only have 5 > entries while others have millions (if this helps). I have tried > running the same program in Cloudera's mapReduce and hive and it seems > to work on that platform. Is there something I'm missing? > > > Thanks, > > Erik
