functioner commented on pull request #2727:
URL: https://github.com/apache/hadoop/pull/2727#issuecomment-788070811


   > From the description of 
[HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552):
   > 
   > > We are doing some systematic fault injection testing in Hadoop-3.2.2 and 
when we try to run a client (e.g., `bin/hdfs dfs -ls /`) to our HDFS cluster (1 
NameNode, 2 DataNodes), the client gets stuck forever.
   > 
   > @functioner I think the issue was surfaced by half-closed TCP connection 
(connection loss without RST packet) caused by HW issue like power fault. What 
kind of fault injection caused this?
   
   @iwasakims  In Server.java, the socket channel is accepted in line 1400, and 
then the fault (IOException) is injected in line 1402 (1403 or 1404 will also 
work).
   ```
       public void run() {
         while (running) {
           SelectionKey key = null;
           try {
             getSelector().select();
             Iterator<SelectionKey> iter = 
getSelector().selectedKeys().iterator();
             while (iter.hasNext()) {
               key = iter.next();
               iter.remove();
               try {
                 if (key.isValid()) {
                   if (key.isAcceptable())
                     doAccept(key);
                 }
               } catch (IOException e) {                         // line 1350
               }
               key = null;
             }
           } catch (OutOfMemoryError e) {
             // ...
           } catch (Exception e) {
             // ...
           }
         }
       }
       void doAccept(SelectionKey key) throws InterruptedException, 
IOException, 
           OutOfMemoryError {
         ServerSocketChannel server = (ServerSocketChannel) key.channel();
         SocketChannel channel;
         while ((channel = server.accept()) != null) {           // line 1400
   
           channel.configureBlocking(false);                     // line 1402
           channel.socket().setTcpNoDelay(tcpNoDelay);           // line 1403
           channel.socket().setKeepAlive(true);                  // line 1404
           
           Reader reader = getReader();
           Connection c = connectionManager.register(channel,
               this.listenPort, this.isOnAuxiliaryPort);
           // If the connectionManager can't take it, close the connection.
           if (c == null) {
             if (channel.isOpen()) {
               IOUtils.cleanup(null, channel);
             }
             connectionManager.droppedConnections.getAndIncrement();
             continue;
           }
           key.attach(c);  // so closeCurrentConnection can get the object
           reader.addConnection(c);
         }
       }
   ```
   
   The basic idea of this fault injection is to allow the server to accept the 
connection in line 1400 but stop it from being added to reader so that server 
can't be aware of the data from this client. The injected IOException is 
swallowed in line 1350.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to