FYI I've created a Jira for followup discussion. https://issues.apache.org/jira/browse/NUTCH-1452
On Tue, Aug 7, 2012 at 11:21 AM, Ferdy Galema <[email protected]>wrote: > Hi, > > There still is a property in nutch-default > 'hadoop.job.history.user.location' that redirects the creation of history > files from job output locations to a custom location. I noticed that the > current value does not work well with CDH, because ${hadoop.log.dir} is not > defined. This actually causes the entire job history in the jobtracker to > show empty info. (With 'incomplete' job status). > > Changing the value to /user/myname/history does work for example. However > I have done some more testing and it seems that this property can be set to > 'none', because the job history is ALSO stored in the central jobtracker > location anyway. The 'hadoop.job.history.user.location' property specifies > an extra location. But if it is set to an invalid value, it causes the > central history location to NOT store it. Please see for more details: > http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html > > Setting this value to 'none' keeps the central history but prevents the > job to write history in the job output location. If a user wants to have an > extra copy of the history files, nothing prevents him/her from specifying > another value in nutch-site for example. Another option is to set it to > 'history' which does work with CDH. (This writes all logs to 'history' in > the user directory in the configured filesystem, usually dfs). The final > option is to simply remove this value and not meddle with hadoop properties > at all. But that actually requires all jobs to correctly ignore these > files. I am not up to date how well this currently works with Nutch jobs. > This question is most relevant for trunk, since trunk heavily relies on the > filesystem for jobs. > > What do you think? It would be great if anyone could do some testing with > trunk and possible another Hadoop distro. (I.e. the official 1.0.3). Then > we have some more input to decide what the best option is: > A) Set property to 'none' > B) Set property to 'history' > C) Remove property, see what happens, possibly fix jobs > D) ? > > Ferdy. > > > >

