Thanks for the reply Arun. We've recompiled the hadoop native binaries and
they seem to be loading fine. We are rerunning the job to see if it works
now.

Regards,

-vishal.

-----Original Message-----
From: Arun C Murthy [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 23, 2007 11:40 PM
To: [email protected]; [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3

Vishal,

On Wed, May 23, 2007 at 02:15:50PM +0530, Vishal Shah wrote:
>Hi Arun,
>
>  Thanks for the reply. We figured out the root cause of the problem. We
are
>not using the hadoop native libs right now, and the Sun Java Deflater hangs
>sometimes during the reduce phase. Our glibc version is 2.3.5, where as the
>hadoop native libs need 2.4, that's why they are not being used by hadoop. 
>

Personally I've never seen Sun's Deflater hang... but ymmv.

Anyway, there is nothing in native hadoop libs which need glibc-2.4 - looks
like this is just an artifact of the fact that the machine on which the
release was cut (i.e. the native libs in 0.12.3 were built) had glibc-2.4. I
have glibb-2.3.6-r3 on my machine and things work fine...

>  I was wondering if there is a version of the native libs that would work
>with glibc 2.3. 

I don't have access right-away to a box with glibc-2.3.5, but it's really
easy to build them yourself - details here: 
http://wiki.apache.org/lucene-hadoop/NativeHadoop

hth,
Arun

>If not, we'll have to upgrade the glibc version on all
>machines to 2.4.
>
>Regards,
>
>-vishal.
>
>-----Original Message-----
>From: Arun C Murthy [mailto:[EMAIL PROTECTED] 
>Sent: Tuesday, May 22, 2007 4:24 PM
>To: [email protected]
>Cc: [EMAIL PROTECTED]
>Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3
>
>Vishal Shah wrote:
>> Hi,
>>  
>>   We upgraded our code to nutch 0.9 stable version along with hadoop
>0.12.3,
>> which is the latest version of hadoop 0.12.
>>  
>>   After the upgrade, I am seeing task failures during the reduce phase
for
>> parse and fetch (without the parsing option) sometimes.
>>  
>>   Usually, it's just one reduce task that creates this problem. The
>> jobtracker kills this task saying "Task failed to report status for 602
>> seconds. Killing task"
>>  
>>   I tried running the task using IsolationRunner, and it works fine. I am
>> suspecting that there is probably a long computation happening during the
>> reduce phase for one of the keys due to which the tasktracker isn't able
>to
>> report status to the jobtracker in time. 
>>  
>
>If you suspect the long computation one way is to use the 'reporter' 
>parameter to your mapper/reducer to provide status updates and ensure 
>that the TaskTracker doesn't kill the task i.e. doesn't assume the task 
>has been lost.
>
>hth,
>Arun
>
>>   I was wondering if anyone else has seen a similar problem and if there
>is
>> a fix for it.
>>  
>> Thanks,
>>  
>> -vishal.
>> 
>

Reply via email to