[ 
https://issues.apache.org/jira/browse/HDFS-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986710#comment-15986710
 ] 

Brahma Reddy Battula commented on HDFS-11711:
---------------------------------------------

 *FYI* .
Following is the log from DN

{noformat}
2017-04-25 07:02:44,610 | ERROR | DataXceiver for client 
DFSClient_NONMAPREDUCE_222700060_28 at /192.168.100.48:18124 [Sending block 
BP-262396492-192.168.100.42-1490663057778:blk_1078953155_5605334] | 
datanode5:25009:DataXceiver error processing READ_BLOCK operation  src: 
/192.168.100.48:18124 dst: /192.168.100.48:25009 | DataXceiver.java:304
java.io.FileNotFoundException: 
/srv/BigData/hadoop/data9/dn/current/BP-262396492-192.168.100.42-1490663057778/current/finalized/subdir79/subdir132/blk_1078953155_5605334.meta
 (Too many open files)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at 
org.apache.hadoop.io.nativeio.NativeIO.getShareDeleteFileInputStream(NativeIO.java:757)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:229)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:290)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:617)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:265)
        at java.lang.Thread.run(Thread.java:745)
2017-04-25 07:02:44,610 | INFO  | Async disk worker #87 for volume 
/srv/BigData/hadoop/data9/dn/current | Deleted 
BP-262396492-192.168.100.42-1490663057778 blk_1078953155_5605334 file
{noformat}

I feel, we should handle {{Too Many Open Files}} here

{code}
     } catch (FileNotFoundException e) {
          // The replica is on its volume map but not on disk
          datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
          datanode.data.invalidate(block.getBlockPoolId(),
              new Block[]{block.getLocalBlock()});
          throw e;
{code}

> DN should not delete the block On "Too many open files" Exception
> -----------------------------------------------------------------
>
>                 Key: HDFS-11711
>                 URL: https://issues.apache.org/jira/browse/HDFS-11711
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>
>  *Seen the following scenario in one of our customer environment* 
> * while jobclient writing {{"job.xml"}} there are pipeline failures and 
> written to only one DN.
> * when mapper reading the {{"job.xml"}}, DN got {{"Too many open files"}} (as 
> system exceed limit) and block got deleted. Hence mapper failed to read and 
> job got failed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to