Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Konstantin Shvachko Mon, 03 Apr 2006 13:35:50 -0700

I agree that the framework must be as general as possible. Which meansone should use some simple

data structure for keys and value, like string or BytesWritable.

Also nothing prevents us from implementing other types on top of theframework as an optional

layer of higher level API.


Here is another example, that I dealt with.

I wanted to use different value types (long, float or string) for bothmap and reduce tasks,depending on the actual key values. So the solution was to encode thevalue type into the key value.

I used keys of the form
l:<name> - indicating the value type is expected to be long
f:<name> - value type is expected to be float
s:<name> - value is a string
The example is under HADOOP-95.
Thought somebody might find it practical.

--Konstantin


Doug Cutting wrote:

Eric Baldeschwieler wrote:
An observation... this whole thread is about limits caused by typesafety. Interestingly, the other implementation of map-reduce doesnot support types at all. Everything is a string.
So I agree that our departure from the paper is the problem.  ;-)
A corollary is that one could simply use BytesWritable for all one'skeys and values, altering only one's WritableComparatorimplementation, and one would not encounter this problem. The use oftypes in Hadoop is thus an optional feature. One could even layer adifferent type system on top of BytesWritable that exhibits thedesired properties.
I'm comfortable letting this lie for a while. But I predict we'venot heard the last of it.
Owen seems to be picking it up, which is fine by me.

Doug

Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Reply via email to