functioner commented on pull request #2727: URL: https://github.com/apache/hadoop/pull/2727#issuecomment-788070811
> From the description of [HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552): > > > We are doing some systematic fault injection testing in Hadoop-3.2.2 and when we try to run a client (e.g., `bin/hdfs dfs -ls /`) to our HDFS cluster (1 NameNode, 2 DataNodes), the client gets stuck forever. > > @functioner I think the issue was surfaced by half-closed TCP connection (connection loss without RST packet) caused by HW issue like power fault. What kind of fault injection caused this? @iwasakims In Server.java, the socket channel is accepted in line 1400, and then the fault (IOException) is injected in line 1402 (1403 or 1404 will also work). ``` public void run() { while (running) { SelectionKey key = null; try { getSelector().select(); Iterator<SelectionKey> iter = getSelector().selectedKeys().iterator(); while (iter.hasNext()) { key = iter.next(); iter.remove(); try { if (key.isValid()) { if (key.isAcceptable()) doAccept(key); } } catch (IOException e) { // line 1350 } key = null; } } catch (OutOfMemoryError e) { // ... } catch (Exception e) { // ... } } } void doAccept(SelectionKey key) throws InterruptedException, IOException, OutOfMemoryError { ServerSocketChannel server = (ServerSocketChannel) key.channel(); SocketChannel channel; while ((channel = server.accept()) != null) { // line 1400 channel.configureBlocking(false); // line 1402 channel.socket().setTcpNoDelay(tcpNoDelay); // line 1403 channel.socket().setKeepAlive(true); // line 1404 Reader reader = getReader(); Connection c = connectionManager.register(channel, this.listenPort, this.isOnAuxiliaryPort); // If the connectionManager can't take it, close the connection. if (c == null) { if (channel.isOpen()) { IOUtils.cleanup(null, channel); } connectionManager.droppedConnections.getAndIncrement(); continue; } key.attach(c); // so closeCurrentConnection can get the object reader.addConnection(c); } } ``` The basic idea of this fault injection is to allow the server to accept the connection in line 1400 but stop it from being added to reader so that server can't be aware of the data from this client. The injected IOException is swallowed in line 1350. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
