Hello all:
When I read the Nutch source code I found that the processDumpJob(String
crawlDb, String output, Configuration config) the in the CrawlDbReader.java
only  set some inputFormat & outputFormat ,and without Mapper and Reducer
class. But it can dump the existing crawldb to a text format.
Can anyone tell me how does it work?
Thanks!

Here is the source code:
-----------------------------------------------
 public void processDumpJob(String crawlDb, String output, Configuration
config) throws IOException {

                    if (LOG.isInfoEnabled()) {
                      LOG.info("CrawlDb dump: starting");
                      LOG.info("CrawlDb db: " + crawlDb);
                    }


                    Path outFolder = new Path(output);

                    JobConf job = new NutchJob(config);
                    job.setJobName("dump " + crawlDb);

                    job.addInputPath(new Path(crawlDb,
CrawlDb.CURRENT_NAME));
                    job.setInputFormat(SequenceFileInputFormat.class);

                    job.setOutputPath(outFolder);
                    job.setOutputFormat(TextOutputFormat.class);
                    job.setOutputKeyClass(Text.class);
                    job.setOutputValueClass(CrawlDatum.class);

                    JobClient.runJob(job);
                    if (LOG.isInfoEnabled()) { LOG.info("CrawlDb dump:
done"); }
                  }
----------------------------------------------------




-- 
--~--~---------~--~----~------------~-------~--

Best Regards,

Yours
Phonechen

-~----------~----~----~----~------~----~------

Reply via email to