[
https://issues.apache.org/jira/browse/HDFS-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986710#comment-15986710
]
Brahma Reddy Battula commented on HDFS-11711:
---------------------------------------------
*FYI* .
Following is the log from DN
{noformat}
2017-04-25 07:02:44,610 | ERROR | DataXceiver for client
DFSClient_NONMAPREDUCE_222700060_28 at /192.168.100.48:18124 [Sending block
BP-262396492-192.168.100.42-1490663057778:blk_1078953155_5605334] |
datanode5:25009:DataXceiver error processing READ_BLOCK operation src:
/192.168.100.48:18124 dst: /192.168.100.48:25009 | DataXceiver.java:304
java.io.FileNotFoundException:
/srv/BigData/hadoop/data9/dn/current/BP-262396492-192.168.100.42-1490663057778/current/finalized/subdir79/subdir132/blk_1078953155_5605334.meta
(Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at
org.apache.hadoop.io.nativeio.NativeIO.getShareDeleteFileInputStream(NativeIO.java:757)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:229)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:290)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:617)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:265)
at java.lang.Thread.run(Thread.java:745)
2017-04-25 07:02:44,610 | INFO | Async disk worker #87 for volume
/srv/BigData/hadoop/data9/dn/current | Deleted
BP-262396492-192.168.100.42-1490663057778 blk_1078953155_5605334 file
{noformat}
I feel, we should handle {{Too Many Open Files}} here
{code}
} catch (FileNotFoundException e) {
// The replica is on its volume map but not on disk
datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
datanode.data.invalidate(block.getBlockPoolId(),
new Block[]{block.getLocalBlock()});
throw e;
{code}
> DN should not delete the block On "Too many open files" Exception
> -----------------------------------------------------------------
>
> Key: HDFS-11711
> URL: https://issues.apache.org/jira/browse/HDFS-11711
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
>
> *Seen the following scenario in one of our customer environment*
> * while jobclient writing {{"job.xml"}} there are pipeline failures and
> written to only one DN.
> * when mapper reading the {{"job.xml"}}, DN got {{"Too many open files"}} (as
> system exceed limit) and block got deleted. Hence mapper failed to read and
> job got failed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]