Hi all, I've been working on a customized Output which works like OnFileSortedOutput but with optimizations that will speed up Map output.
The issue is about the *number of partitions*. My current implementation is set it to number of physicalOutputs but the *partitionId will exceed that number* when runnning some jobs. After referring to MRPartitioner, I found the number of partition is set to "tez.runtime.num.expected.partitions" (or 1 if null) . So what is the difference between that property and physicalOutputs ? Also , when running Hive queries over Tez (with my customized output), a Hive property "hive.exec.reducers.bytes.per.reducer" could also alter the number of partitions, according to my observation. Any ideas ? Thanks Manu Zhang