[ 
https://issues.apache.org/jira/browse/HDFS-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272598#comment-13272598
 ] 

Uma Maheswara Rao G commented on HDFS-3398:
-------------------------------------------

Seems to be a good catch Brahma.

@Todd, It looks to be problem to me Todd. When writing on to socket if other 
peer goes down, it may treat that as client error and client will exit.
How about catching socket operations and setting errorIndex to 1 (treating 
first node as bad)?

I did not see the below check  in 205 code.
         {code}
         if (errorIndex == -1) { // not a datanode error
            streamerClosed = true;
          }
          {code}

205 code on throwable:
{code}
  } catch (Throwable e) {
              LOG.warn("DataStreamer Exception: " + 
                       StringUtils.stringifyException(e));
              if (e instanceof IOException) {
                setLastException((IOException)e);
              }
              hasError = true;
            }
          }
 {code}


 In trunk:
 {code}
  } catch (Throwable e) {
          DFSClient.LOG.warn("DataStreamer Exception", e);
          if (e instanceof IOException) {
            setLastException((IOException)e);
          }
          hasError = true;
          if (errorIndex == -1) { // not a datanode error
            streamerClosed = true;
          }
        }
{code}

                
> Client will not retry when primaryDN is down once it's just got pipeline
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3398
>                 URL: https://issues.apache.org/jira/browse/HDFS-3398
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 2.0.0
>            Reporter: Brahma Reddy Battula
>            Priority: Minor
>
> Scenario:
> =========
> Start NN and three DN"S
> Get the datanode to which blocks has to be replicated.
> from 
> {code}
> nodes = nextBlockOutputStream(src);
> {code}
> Before start writing to the DN ,kill the primary DN.
> {code}
> // write out data to remote datanode
>           blockStream.write(buf.array(), buf.position(), buf.remaining());
>           blockStream.flush();
> {code}
> Now write will fail with the exception 
> {noformat}
> 2012-05-10 14:21:47,993 WARN  hdfs.DFSClient (DFSOutputStream.java:run(552)) 
> - DataStreamer Exception
> java.io.IOException: An established connection was aborted by the software in 
> your host machine
>       at sun.nio.ch.SocketDispatcher.write0(Native Method)
>       at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>       at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>       at sun.nio.ch.IOUtil.write(Unknown Source)
>       at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>       at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>       at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
>       at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
>       at java.io.BufferedOutputStream.write(Unknown Source)
>       at java.io.DataOutputStream.write(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:513)
> {noformat}
> .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to