[
https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169308#comment-13169308
]
Markus Jelsma commented on NUTCH-1219:
--------------------------------------
Keep in mind that does not work:
{code}
Configuration conf = getConf();
Job job = new Job(conf, jobName);
job.setJarByClass(DomainStatistics.class);
conf.setInt("domain.statistics.mode", mode);
conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
{code}
but this does:
{code}
Configuration conf = getConf();
conf.setInt("domain.statistics.mode", mode);
conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
Job job = new Job(conf, jobName);
job.setJarByClass(DomainStatistics.class);
{code}
It is easily overlooked with default settings!!
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
> Key: NUTCH-1219
> URL: https://issues.apache.org/jira/browse/NUTCH-1219
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been
> done for the Nutchgora branch. If i'm not mistaken we can already upgrade to
> the latest 0.20.5 version that still carries the legacy API so we can,
> without immediately upgrading to 0.21 or higher, port the jobs to the new API
> without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down
> your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira