[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814471#comment-13814471
 ] 

firegun commented on MAPREDUCE-5606:
------------------------------------

thanks for your reply,  f we hava a plan upgrade the hadoop to 2.0.X next 
month,but there  now we need to do some test first.because the version on the 
production is modify base on 1.0.3.

i think  1.0.3  version will last at least for  a month.

before this time ,the server work very well more then 1 years.

now i can do is monitor the jobTracker log,when found some datanode was 
crash,then add this datanode to hostexclude.

can u give me some idea, or why it happened?

need  i turn down  the socket timeout  ,fail time?

thanks

> JobTracker blocked for DFSClient: Failed recovery attempt
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-5606
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5606
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.0.3
>         Environment: centos 5.8  jdk 1.7 
>            Reporter: firegun
>            Assignee: firegun
>            Priority: Critical
>
> when a  datanode was crash,the server can  ping ok,but can not  call rpc ,and 
> also can not ssh login. and then jobTracker may be request a block on this 
> datanode.
> it will happened ,the  JobTracker can not work,the webUI is also 
> unwork,hadoop job -list also unwork,the jobTracker logs no other info .
> and then we need to restart the datanode.
> then jobTraker can work too,but the taskTracker num come to zero,
> we need run : hadoop mradmin -refreshNodes
> then the JobTracker begin to add taskTraker ,but is very slowly.
> this problem occur 5time  in 2weeks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to