[ 
https://issues.apache.org/jira/browse/NUTCH-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1452:
----------------------------------------

    Fix Version/s: 2.2
                   1.7
    
> hadoop.job.history.user.location in nutch-default making job history useless
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1452
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1452
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>             Fix For: 1.7, 2.2
>
>
> There is still a property in nutch-default 'hadoop.job.history.user.location' 
> that redirects the creation of history files from job output locations to a 
> custom location. I noticed that the current value does not work well with 
> cloudera (I have tested cdh3u4), because ${hadoop.log.dir} is not defined. 
> This actually causes the job in the jobtracker to show empty info. (With 
> 'incomplete' job status). This is only when the job moves to retired. When it 
> is still in 'completed', all is looking well.
> This property can be set to 'none', because the job history is ALSO stored in 
> the central jobtracker location anyway. The 
> 'hadoop.job.history.user.location' property specifies an extra location. But 
> if it is set to an invalid value, it causes the central history location to 
> NOT store it, so it seems. Please see for more details:
> http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html
> Besides setting it to 'none', another option is to set it to 'history' which 
> does work with cdh. (This writes all logs to 'history' in the user directory 
> in the configured filesystem, usually dfs). The final option is to simply 
> remove this value and not meddle with hadoop properties at all. But that 
> actually requires all jobs to correctly ignore these files. I am not up to 
> date how well this currently works with Nutch jobs. This question is most 
> relevant for trunk, since trunk heavily relies on the filesystem for jobs.
> What do you think?
> A) Set property to 'none'
> B) Set property to 'history'
> C) Remove property, see what happens, possibly fix jobs
> D) ?
> For now, I opt for A. But I think we need some more input with other 
> distributions (for example official Hadoop 1.x) and also Nutch trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to