[ 
https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169308#comment-13169308
 ] 

Markus Jelsma commented on NUTCH-1219:
--------------------------------------

Keep in mind that does not work:

{code}
    Configuration conf = getConf();
    Job job = new Job(conf, jobName);
    job.setJarByClass(DomainStatistics.class);
    conf.setInt("domain.statistics.mode", mode);
    conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
{code}

but this does:

{code}
    Configuration conf = getConf();
    conf.setInt("domain.statistics.mode", mode);
    conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
    Job job = new Job(conf, jobName);
    job.setJarByClass(DomainStatistics.class);
{code}

It is easily overlooked with default settings!!
                
> Upgrade all jobs to new MapReduce API
> -------------------------------------
>
>                 Key: NUTCH-1219
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1219
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.5
>
>
> We should upgrade to the new Hadoop API for Nutch trunk as already has been 
> done for the Nutchgora branch. If i'm not mistaken we can already upgrade to 
> the latest 0.20.5 version that still carries the legacy API so we can, 
> without immediately upgrading to 0.21 or higher, port the jobs to the new API 
> without having the need for a separate branch to work on.
> To the committers who created/ported jobs in NutchGora, please write down 
> your advice and experience.
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to