On Apr 8, 2007, at 1:48 AM, Tom White wrote:
I think we can do a lot to improve the use of generics, particularly
in MapReduce.
<... use generics in interfaces ...>
I like it. I was thrown off at first because classes aren't
specialized based on their template parameters, but specialization of
the parent class is available.
Reducer would be changed similarly, although I'm not sure how we could
constrain the output types of the Mapper to be the input types of the
Reducer. Perhaps via the JobConf?
That is easy, actually. In the JobClient, we'd just check to see if
the types all play well together. Basically, you need:
K1,V1 -> map -> K2, V2
K2, V2 -> combiner -> K2, V2 (if used)
K2, V2 -> reduce -> K3, V3
It will be a tricky bit of specification to decide exactly what the
right semantics are, since even with the generics, the application
isn't required to define them. Therefore, we have 5 places where we
could find a value for K2 (config, mapper output, combiner input,
combiner output, or reduce input). Clearly all classes must be
checked for consistency once Hadoop decides what the right values are
for each type.
The other piece that this interacts with is the desire to use context
objects in the parameter list. However, they appear to be orthogonal
to each other.
-- Owen