Konstantin Shvachko wrote:
Here is another example, that I dealt with.
I wanted to use different value types (long, float or string) for both map and reduce tasks, depending on the actual key values. So the solution was to encode the value type into the key value.
I used keys of the form
l:<name> - indicating the value type is expected to be long
f:<name> - value type is expected to be float
s:<name> - value is a string
The example is under HADOOP-95.
Thought somebody might find it practical.

On a related note, ObjectWritable can be used as input or output type, and can wrap any Writable class, thus permitting polymorphic inputs and outputs. Nutch uses this to, e.g., combine a URL's incoming anchor texts and its content when indexing. The input type is ObjectWritable, and the indexer's InputFormat wraps values from a variety of files. The indexing reducer can then use the 'instanceof' operator to determine how to process each input value. To be more object-oriented, one could have all of these classes implement some Indexable interface whose methods are invoked when reducing.

Doug

Reply via email to