Thanks for the reply Arun. We've recompiled the hadoop native binaries and they seem to be loading fine. We are rerunning the job to see if it works now.
Regards, -vishal. -----Original Message----- From: Arun C Murthy [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 23, 2007 11:40 PM To: [email protected]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3 Vishal, On Wed, May 23, 2007 at 02:15:50PM +0530, Vishal Shah wrote: >Hi Arun, > > Thanks for the reply. We figured out the root cause of the problem. We are >not using the hadoop native libs right now, and the Sun Java Deflater hangs >sometimes during the reduce phase. Our glibc version is 2.3.5, where as the >hadoop native libs need 2.4, that's why they are not being used by hadoop. > Personally I've never seen Sun's Deflater hang... but ymmv. Anyway, there is nothing in native hadoop libs which need glibc-2.4 - looks like this is just an artifact of the fact that the machine on which the release was cut (i.e. the native libs in 0.12.3 were built) had glibc-2.4. I have glibb-2.3.6-r3 on my machine and things work fine... > I was wondering if there is a version of the native libs that would work >with glibc 2.3. I don't have access right-away to a box with glibc-2.3.5, but it's really easy to build them yourself - details here: http://wiki.apache.org/lucene-hadoop/NativeHadoop hth, Arun >If not, we'll have to upgrade the glibc version on all >machines to 2.4. > >Regards, > >-vishal. > >-----Original Message----- >From: Arun C Murthy [mailto:[EMAIL PROTECTED] >Sent: Tuesday, May 22, 2007 4:24 PM >To: [email protected] >Cc: [EMAIL PROTECTED] >Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3 > >Vishal Shah wrote: >> Hi, >> >> We upgraded our code to nutch 0.9 stable version along with hadoop >0.12.3, >> which is the latest version of hadoop 0.12. >> >> After the upgrade, I am seeing task failures during the reduce phase for >> parse and fetch (without the parsing option) sometimes. >> >> Usually, it's just one reduce task that creates this problem. The >> jobtracker kills this task saying "Task failed to report status for 602 >> seconds. Killing task" >> >> I tried running the task using IsolationRunner, and it works fine. I am >> suspecting that there is probably a long computation happening during the >> reduce phase for one of the keys due to which the tasktracker isn't able >to >> report status to the jobtracker in time. >> > >If you suspect the long computation one way is to use the 'reporter' >parameter to your mapper/reducer to provide status updates and ensure >that the TaskTracker doesn't kill the task i.e. doesn't assume the task >has been lost. > >hth, >Arun > >> I was wondering if anyone else has seen a similar problem and if there >is >> a fix for it. >> >> Thanks, >> >> -vishal. >> >
