[ 
https://issues.apache.org/jira/browse/HADOOP-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699995#comment-13699995
 ] 

Hua xu commented on HADOOP-9684:
--------------------------------

However, I think that it should not be seen as the indicator of success for the 
setup or initialization of the Connection instance ,that property socket is not 
null!

                
> The initialization may be missed for org.apache.ipc.Client$Connection
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-9684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9684
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 1.0.3, 0.21.0
>            Reporter: Hua xu
>
> Today, we see that a TaskTracer has keeped throwing the same exception in our 
> production environment.it is that:
> 2013-07-01 18:41:40,023 INFO org.apache.hadoop.mapred.TaskTracker: 
> addFreeSlot : current free slots : 7
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: 
> LaunchTaskAction (registerTask): attempt_201208241212_27521_m_000002_3 task's 
> state:UNASSIGNED
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
> launch : attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: In 
> TaskLauncher, current free slots : 7 and trying to launch 
> attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO 
> org.apache.hadoop.mapreduce.server.tasktracker.Localizer: User-directories 
> for the user sds are already initialized on this TT. Not doing anything.
> 2013-07-01 18:41:43,029 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201208241212_27521_m_000002_3:
> java.lang.NullPointerException
> 2013-07-01 18:41:43,029 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to 
> set finish time for task attempt_201208241212_27521_m_000002_3 when no start 
> time is set, stackTrace is : java.lang.Exception
>       at 
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
>       at 
> org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
>       at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>   Then, we view the log files of the TaskTracker,and find that the 
> TaskTracker throwed Several OutOfMemoryError: Java heap space about ten days 
> ago. after that, the TaskTracker has still throws the exception:
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: 
> LaunchTaskAction (registerTask): attempt_201208241212_26088_m_000043_1 task's 
> state:UNASSIGNED
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
> launch : attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: In 
> TaskLauncher, current free slots : 7 and trying to launch 
> attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO 
> org.apache.hadoop.mapreduce.server.tasktracker.Localizer: Initializing user 
> sds on this TT.
> 2013-06-22 12:39:42,300 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201208241212_26088_m_000043_1:
> java.lang.NullPointerException
>       at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:630)
>       at org.apache.hadoop.ipc.Client.call(Client.java:886)
>       at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>       at $Proxy5.getFileInfo(Unknown Source)
>       at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>       at $Proxy5.getFileInfo(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:850)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:620)
>       at 
> org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:3984)
>       at 
> org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1036)
>       at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
>       at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
> 2013-06-22 12:39:42,300 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to 
> set finish time for task attempt_201208241212_26088_m_000043_1 when no start 
> time is set, stackTrace is : java.lang.Exception
>       at 
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
>       at 
> org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
>       at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>   
>   Since then, the TaskTracker has not completed any task.we can find the 
> property of Connection -out is null from the above exeption,and it is caused 
> by the failure of method setupIOstream() in the class 
> org.apache.ipc.Client$Connection.anyway, the instance of Connection is not 
> null and cached by the org.apache.ipc.Client.We guess that it throwed a 
> OutOfMemoryError when a thread called the setupIOstream() because of RPC, so 
> that some properties of the Connection are null, and throw 
> NullPointerException when the Connection are accessed by other threads,which 
> is fron the cache.It must be correct to make sure that the instance of 
> Connection could only be access after initialized successfully. 
>   On the other hand, we also simulate this scenario.Firstly, one thread 
> create the Connection instance and throw OutOfMemoryError when it call 
> connection.setupIOstreams().after that, the other thread start call RPC 
> through by the instance and it keeps throwing the same exceptions

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to