Hi Arun,

I  did specify the input format.  The first job's output format is
SequenceFileOutputFormat, and the second job's input format is
SequenceFileInputFormat.  But it seems that the two formats don't
connect.
Is there a reason that setKeyInputClass and setValueInputClass are
being deprecated?  I saw these two being used, even in nutch.

Please see the code snippet below:

<code>
        JobConf writeJob = new JobConf(SequenceFileIndexer.class);
        writeJob.setJobName("testing");
        writeJob.setInputFormat(SequenceFileInputFormat.class);
        writeJob.setInputPath(path);

        Path outPath = new Path("write-out");
        writeJob.setOutputPath(outPath);
        writeJob.setOutputFormat(SequenceFileOutputFormat.class);
        writeJob.setMapperClass(SequenceFileIndexer.class);

        JobClient.runJob(writeJob);  // this job finished correctly

        JobConf secondJob = new JobConf(SequenceFileIndexer.class);
        secondJob.setJobName("second");
        secondJob.setInputFormat(SequenceFileInputFormat.class);
        secondJob.setInputPath(outPath);
        secondJob.setOutputKeyClass(Text.class);
        secondJob.setOutputValueClass(Text.class);
        Path finalPath = new Path("final");
        secondJob.setOutputPath(finalPath);
        secondJob.setMapperClass(SequenceFileIndexer.class);
        JobClient.runJob(secondJob);  // but this job blew up because
it complains the file format is not correct



      public void map(Text key, Text val,
            OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {

        String x = val.toString();
        String k = key.toString();

        output.collect(key, val);

    }





</code>

On Dec 18, 2007 12:27 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> Jim,
>
>    Hopefully you've fixed this and gone ahead; just in case...
>
>    You were right in using SequenceFile with <Text, Text> as the
> key/value types for your first job.
>
>    The problem is that you did not specify an *input-format* for your
> second job. The Hadoop Map-Reduce framework assumes TextInputFormat as
> the default, which is <LongWritable, Text> and hence the
> behaviour/exceptions you ran into...
>
> hth,
> Arun
>
> PS: Do take a look at
> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html,
> specifically the section titled Job Input
> (http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Job+Input).
>
> Do let us know if how and where we should improve it... Thanks!
>
>
>
> Jim the Standing Bear wrote:
> > Just an update... my problem seems to be beyond defining generic types.
> >
> > Ted, I dont know if you have the answer for this question, which is
> > regarding SequenceFile.
> >
> > If I am to create a SequenceFile by hand, I can do the following:
> >
> > <code>
> > JobConf jobConf = new JobConf(MyClass.class);
> > JobClient jobClient = new JobClient(jobConf);
> >
> > FileSystem fileSystem = jobClient.getFs();
> > SequenceFile.Writer writer = SequenceFile.createWriter(fileSystem,
> > jobConf, path, Text.class, Text.class);
> >
> > </code>
> >
> > After that, I can write all Text-based keys and values by doing this:
> >
> > <code>
> > Text keyText = new Text();
> > keyText.set("mykey");
> >
> > Text valText = new Text();
> > valText.set("myval");
> >
> > writer.append(keyText, valText);
> > </code>
> >
> > As you can see, there is no LongWriteable what-so-ever.
> >
> > However, in a map/reduce job, if I am to specify
> > <code>
> > jobConf.setOutputFormat(SequenceFileOutputFormat.class);
> > </code>
> >
> > And later in the mapper, if I am to say
> > <code>
> > Text newkey = new Text();
> > newkey.set("AAA");
> >
> > Text newval = new Text();
> > newval.set("bbb");
> >
> > output.collect(newkey, newval);
> > </code>
> >
> > It would throw an exception, complaining that the key is not LongWriteable.
> >
> > So that's a part of the reason that I am having trouble connecting the
> > pipes - it seems to me that SequenceFile and SequenceFileOutputFormat
> > are talking about two different kinds of "sequence files"...
>
>



-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Reply via email to