Re: question on file, inputformats and outputformats

Arun C Murthy Mon, 17 Dec 2007 21:30:36 -0800

Jim,

  Hopefully you've fixed this and gone ahead; just in case...

You were right in using SequenceFile with <Text, Text> as thekey/value types for your first job.

The problem is that you did not specify an *input-format* for yoursecond job. The Hadoop Map-Reduce framework assumes TextInputFormat asthe default, which is <LongWritable, Text> and hence thebehaviour/exceptions you ran into...


hth,
Arun

PS: Do take a look athttp://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html,specifically the section titled Job Input(http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Job+Input).


Do let us know if how and where we should improve it... Thanks!


Jim the Standing Bear wrote:

Just an update... my problem seems to be beyond defining generic types.

Ted, I dont know if you have the answer for this question, which is
regarding SequenceFile.

If I am to create a SequenceFile by hand, I can do the following:

<code>
JobConf jobConf = new JobConf(MyClass.class);
JobClient jobClient = new JobClient(jobConf);

FileSystem fileSystem = jobClient.getFs();
SequenceFile.Writer writer = SequenceFile.createWriter(fileSystem,
jobConf, path, Text.class, Text.class);

</code>

After that, I can write all Text-based keys and values by doing this:

<code>
Text keyText = new Text();
keyText.set("mykey");

Text valText = new Text();
valText.set("myval");

writer.append(keyText, valText);
</code>

As you can see, there is no LongWriteable what-so-ever.

However, in a map/reduce job, if I am to specify
<code>
jobConf.setOutputFormat(SequenceFileOutputFormat.class);
</code>

And later in the mapper, if I am to say
<code>
Text newkey = new Text();
newkey.set("AAA");

Text newval = new Text();
newval.set("bbb");

output.collect(newkey, newval);
</code>

It would throw an exception, complaining that the key is not LongWriteable.

So that's a part of the reason that I am having trouble connecting the
pipes - it seems to me that SequenceFile and SequenceFileOutputFormat
are talking about two different kinds of "sequence files"...

Re: question on file, inputformats and outputformats

Reply via email to