[ 
https://issues.apache.org/jira/browse/HADOOP-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua xu updated HADOOP-9684:
---------------------------

              Labels:   (was: patch)
    Target Version/s: 0.21.0, 1.0.3  (was: 1.0.3, 0.21.0)
             Summary: The initialization may be missed for 
org.apache.ipc.Client$Connection  (was: the initialization or missed may be for 
org.apache.ipc.Client$Connection)
    
> The initialization may be missed for org.apache.ipc.Client$Connection
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-9684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9684
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 1.0.3, 0.21.0
>            Reporter: Hua xu
>
> Today, we see that a TaskTracer has keeped throwing the same exception in our 
> production environment.it is that:
> 2013-07-01 18:41:40,023 INFO org.apache.hadoop.mapred.TaskTracker: 
> addFreeSlot : current free slots : 7
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: 
> LaunchTaskAction (registerTask): attempt_201208241212_27521_m_000002_3 task's 
> state:UNASSIGNED
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
> launch : attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: In 
> TaskLauncher, current free slots : 7 and trying to launch 
> attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO 
> org.apache.hadoop.mapreduce.server.tasktracker.Localizer: User-directories 
> for the user sds are already initialized on this TT. Not doing anything.
> 2013-07-01 18:41:43,029 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201208241212_27521_m_000002_3:
> java.lang.NullPointerException
> 2013-07-01 18:41:43,029 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to 
> set finish time for task attempt_201208241212_27521_m_000002_3 when no start 
> time is set, stackTrace is : java.lang.Exception
>       at 
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
>       at 
> org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
>       at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>   Then, we view the log files of the TaskTracker,and find that the 
> TaskTracker throwed Several OutOfMemoryError: Java heap space about ten days 
> ago. after that, the TaskTracker has still throws the exception:
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: 
> LaunchTaskAction (registerTask): attempt_201208241212_26088_m_000043_1 task's 
> state:UNASSIGNED
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
> launch : attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: In 
> TaskLauncher, current free slots : 7 and trying to launch 
> attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO 
> org.apache.hadoop.mapreduce.server.tasktracker.Localizer: Initializing user 
> sds on this TT.
> 2013-06-22 12:39:42,300 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201208241212_26088_m_000043_1:
> java.lang.NullPointerException
>       at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:630)
>       at org.apache.hadoop.ipc.Client.call(Client.java:886)
>       at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
>       at $Proxy5.getFileInfo(Unknown Source)
>       at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>       at $Proxy5.getFileInfo(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:850)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:620)
>       at 
> org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:3984)
>       at 
> org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1036)
>       at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
>       at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
> 2013-06-22 12:39:42,300 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to 
> set finish time for task attempt_201208241212_26088_m_000043_1 when no start 
> time is set, stackTrace is : java.lang.Exception
>       at 
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
>       at 
> org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
>       at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
>       at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>   
>   Since then, the TaskTracker has not completed any task.we can find the 
> property of Connection -out is null from the above exeption,and it is caused 
> by the failure of method setupIOstream() in the class 
> org.apache.ipc.Client$Connection.anyway, the instance of Connection is not 
> null and cached by the org.apache.ipc.Client.We guess that it throwed a 
> OutOfMemoryError when a thread called the setupIOstream() because of RPC, so 
> that some properties of the Connection are null, and throw 
> NullPointerException when the Connection are accessed by other threads,which 
> is fron the cache.It must be correct to make sure that the instance of 
> Connection could only be cached after initialized successfully. We fixed the 
> Method getConnection() to guarantee the semantic:
> private Connection getConnection(InetSocketAddress addr,
>                                    Class<?> protocol,
>                                    UserGroupInformation ticket,
>                                    Call call)
>                                    throws IOException, InterruptedException {
>     if (!running.get()) {
>       // the client is stopped
>       throw new IOException("The client is stopped");
>     }
>     Connection connection;
>     /* we could avoid this allocation for each RPC by having a  
>      * connectionsId object and with set() method. We need to manage the
>      * refs for keys in HashMap properly. For now its ok.
>      */
>     ConnectionId remoteId = new ConnectionId(addr, protocol, ticket);
>     do {
>       synchronized (connections) {
>         connection = connections.get(remoteId);
>         if (connection == null) {
>           connection = new Connection(remoteId);
>           
>  +         connection.setupIOstreams();
>           
>           connections.put(remoteId, connection);
>         }
>       }
>     } while (!connection.addCall(call));
>     
>     //we don't invoke the method below inside "synchronized (connections)"
>     //block above. The reason for that is if the server happens to be slow,
>     //it will take longer to establish a connection and that will slow the
>     //entire system down.
> -    connection.setupIOstreams();
>     
>     return connection;
>   }
>   On the other hand, we also simulate this scenario.Firstly, one thread 
> create the Connection instance and throw OutOfMemoryError when it call 
> connection.setupIOstreams().after that, the other thread start call RPC 
> through by the instance and it keeps throwing the same exceptions

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to