Ferdy Galema created NUTCH-1452:
-----------------------------------

             Summary: hadoop.job.history.user.location in nutch-default making 
job history useless
                 Key: NUTCH-1452
                 URL: https://issues.apache.org/jira/browse/NUTCH-1452
             Project: Nutch
          Issue Type: Bug
            Reporter: Ferdy Galema


There is still a property in nutch-default 'hadoop.job.history.user.location' 
that redirects the creation of history files from job output locations to a 
custom location. I noticed that the current value does not work well with 
cloudera (I have tested cdh3u4), because ${hadoop.log.dir} is not defined. This 
actually causes the job in the jobtracker to show empty info. (With 
'incomplete' job status). This is only when the job moves to retired. When it 
is still in 'completed', all is looking well.

This property can be set to 'none', because the job history is ALSO stored in 
the central jobtracker location anyway. The 'hadoop.job.history.user.location' 
property specifies an extra location. But if it is set to an invalid value, it 
causes the central history location to NOT store it, so it seems. Please see 
for more details:
http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html

Besides setting it to 'none', another option is to set it to 'history' which 
does work with cdh. (This writes all logs to 'history' in the user directory in 
the configured filesystem, usually dfs). The final option is to simply remove 
this value and not meddle with hadoop properties at all. But that actually 
requires all jobs to correctly ignore these files. I am not up to date how well 
this currently works with Nutch jobs. This question is most relevant for trunk, 
since trunk heavily relies on the filesystem for jobs.

What do you think?
A) Set property to 'none'
B) Set property to 'history'
C) Remove property, see what happens, possibly fix jobs
D) ?

For now, I opt for A. But I think we need some more input with other 
distributions (for example official Hadoop 1.x) and also Nutch trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to