Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Doug Cutting Mon, 03 Apr 2006 14:56:36 -0700

Konstantin Shvachko wrote:

Here is another example, that I dealt with.
I wanted to use different value types (long, float or string) for bothmap and reduce tasks,depending on the actual key values. So the solution was to encode thevalue type into the key value.
I used keys of the form
l:<name> - indicating the value type is expected to be long
f:<name> - value type is expected to be float
s:<name> - value is a string
The example is under HADOOP-95.
Thought somebody might find it practical.

On a related note, ObjectWritable can be used as input or output type,and can wrap any Writable class, thus permitting polymorphic inputs andoutputs. Nutch uses this to, e.g., combine a URL's incoming anchortexts and its content when indexing. The input type is ObjectWritable,and the indexer's InputFormat wraps values from a variety of files. Theindexing reducer can then use the 'instanceof' operator to determine howto process each input value. To be more object-oriented, one could haveall of these classes implement some Indexable interface whose methodsare invoked when reducing.


Doug

Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Reply via email to