I agree that the framework must be as general as possible. Which means
one should use some simple
data structure for keys and value, like string or BytesWritable.
Also nothing prevents us from implementing other types on top of the
framework as an optional
layer of higher level API.
Here is another example, that I dealt with.
I wanted to use different value types (long, float or string) for both
map and reduce tasks,
depending on the actual key values. So the solution was to encode the
value type into the key value.
I used keys of the form
l:<name> - indicating the value type is expected to be long
f:<name> - value type is expected to be float
s:<name> - value is a string
The example is under HADOOP-95.
Thought somebody might find it practical.
--Konstantin
Doug Cutting wrote:
Eric Baldeschwieler wrote:
An observation... this whole thread is about limits caused by type
safety. Interestingly, the other implementation of map-reduce does
not support types at all. Everything is a string.
So I agree that our departure from the paper is the problem. ;-)
A corollary is that one could simply use BytesWritable for all one's
keys and values, altering only one's WritableComparator
implementation, and one would not encounter this problem. The use of
types in Hadoop is thus an optional feature. One could even layer a
different type system on top of BytesWritable that exhibits the
desired properties.
I'm comfortable letting this lie for a while. But I predict we've
not heard the last of it.
Owen seems to be picking it up, which is fine by me.
Doug