Re: Hadoop Job without Mapper and Reducer class.

Enis Soztutar Mon, 21 Apr 2008 01:34:37 -0700

Hi,

JobConf has some default values, which are IdentityMapper andIdentityReducer. These functors, as their name implies, does not alterthe data but pass intact. The dump job does not need to alter the databut to transform from (binary) SequenceFile (InputFormat) to text(OutputFormat).


phonechen wrote:

Hello all:
When I read the Nutch source code I found that the processDumpJob(String
crawlDb, String output, Configuration config) the in the CrawlDbReader.java
only  set some inputFormat & outputFormat ,and without Mapper and Reducer
class. But it can dump the existing crawldb to a text format.
Can anyone tell me how does it work?
Thanks!

Here is the source code:
-----------------------------------------------
 public void processDumpJob(String crawlDb, String output, Configuration
config) throws IOException {

                    if (LOG.isInfoEnabled()) {
                      LOG.info("CrawlDb dump: starting");
                      LOG.info("CrawlDb db: " + crawlDb);
                    }


                    Path outFolder = new Path(output);

                    JobConf job = new NutchJob(config);
                    job.setJobName("dump " + crawlDb);

                    job.addInputPath(new Path(crawlDb,
CrawlDb.CURRENT_NAME));
                    job.setInputFormat(SequenceFileInputFormat.class);

                    job.setOutputPath(outFolder);
                    job.setOutputFormat(TextOutputFormat.class);
                    job.setOutputKeyClass(Text.class);
                    job.setOutputValueClass(CrawlDatum.class);

                    JobClient.runJob(job);
                    if (LOG.isInfoEnabled()) { LOG.info("CrawlDb dump:
done"); }
                  }
----------------------------------------------------

Re: Hadoop Job without Mapper and Reducer class.

Reply via email to