Part of your problem is that you appear to be using a TextInputFormat (the default input format). The TIF produces keys that are LongWritable and values that are Text.
Other input formats produce different types. With recent versions of hadoop, classes that extend InputFormatBase can (and I think should) use templates to describe their output types. Similarly, classes extending MapReduceBase and OutputFormat can specify input/output classes and output classes respectively. I have added more specific comments in-line. On 12/17/07 5:40 PM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote: > 1. Pass in a string to my hadoop program, and it will write this > single key-value pair to a file on the fly. How is your string a key-value pair? Assuming that you have something as simple as tab-delimited text, you may not need to do anything at all other than just copy this data into hadoop. > 2. The first job will read from this file, do some processing, and > write more key-value pairs to other files (the same format as the file > in step 1). Subsequent jobs will read from those files generated by > the first job. This will continue in an iterative manner until some > terminal condition has reached. Can you be more specific? Let's assume that you are reading tab-delimited data. You should set the input format: conf.setInputFormat(TextInputFormat.class); Then, since the output of your map will have a string key and value, you should tell the system this: step1.setOutputKeyClass(Text.class); step1.setOutputValueClass(Text.class); Note that the signature on your map function should be: public static class JoinMap extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { ... public void map(LongWritable k, Text input, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { String[] parts = input.split("\t"); Text key, result; ... output.collect(key, result); } } And your reduce should look something like this: public static class JoinReduce extends MapReduceBase implements Reducer<Text, Text, Text, Mumble> { public void reduce(Text k, Iterator<Text> values, OutputCollector<Text, Mumble> output, Reporter reporter) throws IOException { Text key; Mumble result; .... output.collect(key, result); } } > KeyValueTextInputFormat looks promising This could work, depending on what data you have for input. Set the separator byte to be whatever separates your key from your value and off you go.