Hi all,

I've been working on a customized Output which works like
OnFileSortedOutput but with optimizations that will speed up Map output.

The issue is about the *number of partitions*. My current implementation is
set it to number of physicalOutputs but the *partitionId will exceed that
number* when runnning some jobs.

After referring to  MRPartitioner, I found the number of partition is set
to "tez.runtime.num.expected.partitions" (or 1 if null) . So what is the
difference between that property and physicalOutputs ?

Also , when running Hive queries over Tez (with my customized output), a
Hive property "hive.exec.reducers.bytes.per.reducer" could also alter the
number of partitions, according to my observation.

Any ideas ?
Thanks

Manu Zhang

Reply via email to