[
https://issues.apache.org/jira/browse/HADOOP-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700034#comment-13700034
]
Hua xu commented on HADOOP-9684:
--------------------------------
The source code about the initialization of the Connection instance:
{code}
private synchronized void setupIOstreams() throws InterruptedException {
- if (socket != null || shouldCloseConnection.get()) {
- return;
- }
+ if(this.out != null || shouldCloseConnection.get()){
+ return;
+ }
short ioFailures = 0;
short timeoutFailures = 0;
try {
if (LOG.isDebugEnabled()) {
LOG.debug("Connecting to "+server);
}
while (true) {
try {
this.socket = socketFactory.createSocket();
this.socket.setTcpNoDelay(tcpNoDelay);
// connection time out is 20s
NetUtils.connect(this.socket, remoteId.getAddress(), 20000);
this.socket.setSoTimeout(pingInterval);
break;
} catch (SocketTimeoutException toe) {
/* The max number of retries is 45,
* which amounts to 20s*45 = 15 minutes retries.
*/
handleConnectionFailure(timeoutFailures++, 45, toe);
} catch (IOException ie) {
handleConnectionFailure(ioFailures++, maxRetries, ie);
}
}
InputStream inStream = NetUtils.getInputStream(socket);
OutputStream outStream = NetUtils.getOutputStream(socket);
writeRpcHeader(outStream);
if (useSasl) {
final InputStream in2 = inStream;
final OutputStream out2 = outStream;
UserGroupInformation ticket = remoteId.getTicket();
if (authMethod == AuthMethod.KERBEROS) {
if (ticket.getRealUser() != null) {
ticket = ticket.getRealUser();
}
}
if (ticket.doAs(new PrivilegedExceptionAction<Boolean>() {
@Override
public Boolean run() throws IOException {
return setupSaslConnection(in2, out2);
}
})) {
// Sasl connect is successful. Let's set up Sasl i/o streams.
inStream = saslRpcClient.getInputStream(inStream);
outStream = saslRpcClient.getOutputStream(outStream);
} else {
// fall back to simple auth because server told us so.
authMethod = AuthMethod.SIMPLE;
header = new ConnectionHeader(header.getProtocol(),
header.getUgi(), authMethod);
useSasl = false;
}
}
if (doPing) {
this.in = new DataInputStream(new BufferedInputStream
(new PingInputStream(inStream)));
} else {
this.in = new DataInputStream(new BufferedInputStream
(inStream));
}
//byte[] data = new byte[1024*1024*5];
this.out = new DataOutputStream(new BufferedOutputStream(outStream));
writeHeader();
// update last activity time
touch();
// start the receiver thread after the socket connection has been set up
start();
} catch (IOException e) {
markClosed(e);
close();
}
}
{code}
On the other hand,the program just olny catchs IOException, but does not catch
and process any Errors, such as OutOfMemoryError.
> The initialization may be missed for org.apache.ipc.Client$Connection
> ---------------------------------------------------------------------
>
> Key: HADOOP-9684
> URL: https://issues.apache.org/jira/browse/HADOOP-9684
> Project: Hadoop Common
> Issue Type: Bug
> Components: ipc
> Affects Versions: 1.0.3, 0.21.0
> Reporter: Hua xu
>
> Today, we see that a TaskTracer has keeped throwing the same exception in our
> production environment.it is that:
> 2013-07-01 18:41:40,023 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 7
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201208241212_27521_m_000002_3 task's
> state:UNASSIGNED
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
> launch : attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 7 and trying to launch
> attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO
> org.apache.hadoop.mapreduce.server.tasktracker.Localizer: User-directories
> for the user sds are already initialized on this TT. Not doing anything.
> 2013-07-01 18:41:43,029 WARN org.apache.hadoop.mapred.TaskTracker: Error
> initializing attempt_201208241212_27521_m_000002_3:
> java.lang.NullPointerException
> 2013-07-01 18:41:43,029 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to
> set finish time for task attempt_201208241212_27521_m_000002_3 when no start
> time is set, stackTrace is : java.lang.Exception
> at
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
> at
> org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
> Then, we view the log files of the TaskTracker,and find that the
> TaskTracker throwed Several OutOfMemoryError: Java heap space about ten days
> ago. after that, the TaskTracker has still throws the exception:
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201208241212_26088_m_000043_1 task's
> state:UNASSIGNED
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
> launch : attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 7 and trying to launch
> attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO
> org.apache.hadoop.mapreduce.server.tasktracker.Localizer: Initializing user
> sds on this TT.
> 2013-06-22 12:39:42,300 WARN org.apache.hadoop.mapred.TaskTracker: Error
> initializing attempt_201208241212_26088_m_000043_1:
> java.lang.NullPointerException
> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:630)
> at org.apache.hadoop.ipc.Client.call(Client.java:886)
> at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
> at $Proxy5.getFileInfo(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy5.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:850)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:620)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:3984)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1036)
> at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
> 2013-06-22 12:39:42,300 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to
> set finish time for task attempt_201208241212_26088_m_000043_1 when no start
> time is set, stackTrace is : java.lang.Exception
> at
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
> at
> org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
> at
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>
> Since then, the TaskTracker has not completed any task.we can find the
> property of Connection -out is null from the above exeption,and it is caused
> by the failure of method setupIOstream() in the class
> org.apache.ipc.Client$Connection.anyway, the instance of Connection is not
> null and cached by the org.apache.ipc.Client.We guess that it throwed a
> OutOfMemoryError when a thread called the setupIOstream() because of RPC, so
> that some properties of the Connection are null, and throw
> NullPointerException when the Connection are accessed by other threads,which
> is fron the cache.It must be correct to make sure that the instance of
> Connection could only be access after initialized successfully.
> On the other hand, we also simulate this scenario.Firstly, one thread
> create the Connection instance and throw OutOfMemoryError when it call
> connection.setupIOstreams().after that, the other thread start call RPC
> through by the instance and it keeps throwing the same exceptions
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira