HashPartitioner, strange behavior

Erik Thorson Mon, 26 Aug 2013 12:05:27 -0700

Hello all,

I have a list of log entries that I want to partition by a key. I Map
the list to RDD of String,String. I then find the number of unique
keys and use that to determine the number of partitions. I use
RDD.PartitionBy(new HashPartitioner(# of partitions)). When I look at
the results man of the partitions are empty, while others have keys in
them that should be excluded. Any idea of why this is? I have also
tried it with the RangePartitioner. Same result. Some of the
partitions will be very small meaning that some will only have 5
entries while others have millions (if this helps). I have tried
running the same program in Cloudera's mapReduce and hive and it seems
to work on that platform. Is there something I'm missing?



Thanks,

Erik

HashPartitioner, strange behavior

Reply via email to