Jim,
Hopefully you've fixed this and gone ahead; just in case...
You were right in using SequenceFile with <Text, Text> as the
key/value types for your first job.
The problem is that you did not specify an *input-format* for your
second job. The Hadoop Map-Reduce framework assumes TextInputFormat as
the default, which is <LongWritable, Text> and hence the
behaviour/exceptions you ran into...
hth,
Arun
PS: Do take a look at
http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html,
specifically the section titled Job Input
(http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Job+Input).
Do let us know if how and where we should improve it... Thanks!
Jim the Standing Bear wrote:
Just an update... my problem seems to be beyond defining generic types.
Ted, I dont know if you have the answer for this question, which is
regarding SequenceFile.
If I am to create a SequenceFile by hand, I can do the following:
<code>
JobConf jobConf = new JobConf(MyClass.class);
JobClient jobClient = new JobClient(jobConf);
FileSystem fileSystem = jobClient.getFs();
SequenceFile.Writer writer = SequenceFile.createWriter(fileSystem,
jobConf, path, Text.class, Text.class);
</code>
After that, I can write all Text-based keys and values by doing this:
<code>
Text keyText = new Text();
keyText.set("mykey");
Text valText = new Text();
valText.set("myval");
writer.append(keyText, valText);
</code>
As you can see, there is no LongWriteable what-so-ever.
However, in a map/reduce job, if I am to specify
<code>
jobConf.setOutputFormat(SequenceFileOutputFormat.class);
</code>
And later in the mapper, if I am to say
<code>
Text newkey = new Text();
newkey.set("AAA");
Text newval = new Text();
newval.set("bbb");
output.collect(newkey, newval);
</code>
It would throw an exception, complaining that the key is not LongWriteable.
So that's a part of the reason that I am having trouble connecting the
pipes - it seems to me that SequenceFile and SequenceFileOutputFormat
are talking about two different kinds of "sequence files"...