How to decide number of partitions in a Map Output

Manu Zhang Mon, 20 Jan 2014 18:11:15 -0800

Hi all,

I've been working on a customized Output which works like
OnFileSortedOutput but with optimizations that will speed up Map output.


The issue is about the *number of partitions*. My current implementation is
set it to number of physicalOutputs but the *partitionId will exceed that
number* when runnning some jobs.

After referring to  MRPartitioner, I found the number of partition is set
to "tez.runtime.num.expected.partitions" (or 1 if null) . So what is the
difference between that property and physicalOutputs ?

Also , when running Hive queries over Tez (with my customized output), a
Hive property "hive.exec.reducers.bytes.per.reducer" could also alter the
number of partitions, according to my observation.

Any ideas ?
Thanks

Manu Zhang

How to decide number of partitions in a Map Output

Reply via email to