BTW, each key appears exactly once in the large constant dataset, and exactly once in each MR job's output.
I am thinking the right approach is to consistently partition the job output and the large constant dataset, with the number of partitions being the number of reduce tasks; each part goes into its own file. Make an InputFormat whose number of splits equals the number of reduce tasks. Reading a split will consist of reading a corresponding pair of files, stepping through each. Seems like something that should already be provided by something in org.apache.hadoop.mapreduce.*. Thanks, Mike