[
https://issues.apache.org/jira/browse/HDFS-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149554#comment-16149554
]
Nandakumar commented on HDFS-12367:
-----------------------------------
If corona was executed in one of the datanode, HDFS-12382 would have caused
"Too many open files". I was getting the same error, after applying patch for
HDFS-12382 the issue got resolved in my local.
Also noticed the following in output of {{lsof}} command for corona process
{code}
java 9876 nvadivelu 357u IPv4 0xc4e9fc0d68262f8d 0t0 TCP
10.200.4.230:52234->10.200.4.230:50011 (ESTABLISHED)
java 9876 nvadivelu 358 PIPE 0xc4e9fc0d5e18843d 16384
->0xc4e9fc0d5e18afbd
java 9876 nvadivelu 359 PIPE 0xc4e9fc0d5e18afbd 16384
->0xc4e9fc0d5e18843d
java 9876 nvadivelu 360u KQUEUE
count=0, state=0xa
java 9876 nvadivelu 361 PIPE 0xc4e9fc0d5e18837d 16384
->0xc4e9fc0d5e18a3bd
java 9876 nvadivelu 362 PIPE 0xc4e9fc0d5e1882bd 16384
->0xc4e9fc0d5e18837d
java 9876 nvadivelu 363u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 364 PIPE 0xc4e9fc0d5e1881fd 16384
->0xc4e9fc0d5e18813d
java 9876 nvadivelu 365 PIPE 0xc4e9fc0d5e18813d 16384
->0xc4e9fc0d5e1881fd
java 9876 nvadivelu 366u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 367 PIPE 0xc4e9fc0d5e18807d 16384
->0xc4e9fc0d70ba59fd
java 9876 nvadivelu 368 PIPE 0xc4e9fc0d70ba59fd 16384
->0xc4e9fc0d5e18807d
java 9876 nvadivelu 369u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 370 PIPE 0xc4e9fc0d70ba5ffd 16384
->0xc4e9fc0d70ba56fd
java 9876 nvadivelu 371 PIPE 0xc4e9fc0d70ba56fd 16384
->0xc4e9fc0d70ba5ffd
java 9876 nvadivelu 372u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 373 PIPE 0xc4e9fc0d70ba5f3d 16384
->0xc4e9fc0d70ba563d
java 9876 nvadivelu 374 PIPE 0xc4e9fc0d70ba563d 16384
->0xc4e9fc0d70ba5f3d
java 9876 nvadivelu 375u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 376 PIPE 0xc4e9fc0d70ba4c7d 16384
->0xc4e9fc0d70ba69bd
java 9876 nvadivelu 377 PIPE 0xc4e9fc0d70ba69bd 16384
->0xc4e9fc0d70ba4c7d
java 9876 nvadivelu 378u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 379 PIPE 0xc4e9fc0d70ba6e3d 16384
->0xc4e9fc0d70ba497d
java 9876 nvadivelu 380 PIPE 0xc4e9fc0d70ba497d 16384
->0xc4e9fc0d70ba6e3d
java 9876 nvadivelu 381u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 382 PIPE 0xc4e9fc0d70ba57bd 16384
->0xc4e9fc0d6fdd9b3d
java 9876 nvadivelu 383 PIPE 0xc4e9fc0d6fdd9b3d 16384
->0xc4e9fc0d70ba57bd
java 9876 nvadivelu 384u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 385 PIPE 0xc4e9fc0d6fdd9d7d 16384
->0xc4e9fc0d6fdd9efd
java 9876 nvadivelu 386 PIPE 0xc4e9fc0d6fdd9efd 16384
->0xc4e9fc0d6fdd9d7d
java 9876 nvadivelu 387u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 388 PIPE 0xc4e9fc0d6fdd9e3d 16384
->0xc4e9fc0d658559fd
java 9876 nvadivelu 389 PIPE 0xc4e9fc0d658559fd 16384
->0xc4e9fc0d6fdd9e3d
java 9876 nvadivelu 390u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 391 PIPE 0xc4e9fc0d65855abd 16384
->0xc4e9fc0d6585593d
java 9876 nvadivelu 392 PIPE 0xc4e9fc0d6585593d 16384
->0xc4e9fc0d65855abd
java 9876 nvadivelu 393u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 394 PIPE 0xc4e9fc0d6585587d 16384
->0xc4e9fc0d65855b7d
java 9876 nvadivelu 395 PIPE 0xc4e9fc0d65855b7d 16384
->0xc4e9fc0d6585587d
java 9876 nvadivelu 396u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 397 PIPE 0xc4e9fc0d658557bd 16384
->0xc4e9fc0d65855c3d
java 9876 nvadivelu 398 PIPE 0xc4e9fc0d65855c3d 16384
->0xc4e9fc0d658557bd
java 9876 nvadivelu 399u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 400 PIPE 0xc4e9fc0d65855cfd 16384
->0xc4e9fc0d658556fd
java 9876 nvadivelu 401 PIPE 0xc4e9fc0d658556fd 16384
->0xc4e9fc0d65855cfd
java 9876 nvadivelu 402u KQUEUE
count=0, state=0x8
java 9876 nvadivelu 403 PIPE 0xc4e9fc0d6585563d 16384
->0xc4e9fc0d65855dbd
java 9876 nvadivelu 404 PIPE 0xc4e9fc0d65855dbd 16384
->0xc4e9fc0d6585563d
java 9876 nvadivelu 405u KQUEUE
count=0, state=0x8
....... truncated
{code}
> Ozone: Too many open files error while running corona
> -----------------------------------------------------
>
> Key: HDFS-12367
> URL: https://issues.apache.org/jira/browse/HDFS-12367
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone, tools
> Reporter: Weiwei Yang
> Assignee: Mukul Kumar Singh
>
> Too many open files error keeps happening to me while using corona, I have
> simply setup a single node cluster and run corona to generate 1000 keys, but
> I keep getting following error
> {noformat}
> ./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys
> 1000
> 17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1
> 17/08/28 00:47:42 INFO tools.Corona: Mode: offline
> 17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1.
> 17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1.
> 17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000.
> 17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with
> wwei as owner and quota set to 1152921504606846976 bytes.
> 17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread.
> ...
> ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket:
> bucket-0-34960 of volume: vol-0-05000.
> java.io.IOException: Exception getting XceiverClient.
> at
> org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156)
> at
> org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
> at
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289)
> at
> org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487)
> at
> org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.IllegalStateException: failed to create a child event loop
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
> at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
> at
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
> at
> org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144)
> ... 9 more
> Caused by: java.lang.IllegalStateException: failed to create a child event
> loop
> at
> io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68)
> at
> io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
> at
> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:61)
> at
> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:52)
> at
> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:44)
> at
> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:36)
> at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76)
> at
> org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151)
> at
> org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145)
> at
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
> at
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> ... 12 more
> Caused by: io.netty.channel.ChannelException: failed to open a new selector
> at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128)
> at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:120)
> at
> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
> at
> io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
> ... 25 more
> Caused by: java.io.IOException: Too many open files
> at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
> at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:130)
> at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:69)
> at
> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
> at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
> ... 28 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]