[ 
https://issues.apache.org/jira/browse/FLINK-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579888#comment-16579888
 ] 

Sayat Satybaldiyev commented on FLINK-9831:
-------------------------------------------

I've setup PredefinedOptions.SPINNING_DISK_OPTIMIZED for flink RocksDBBacked 
and I didn't get this exception then. Thought, quite interesting that default 
RocksDB options don't use OS limit settings.

> Too many open files for RocksDB
> -------------------------------
>
>                 Key: FLINK-9831
>                 URL: https://issues.apache.org/jira/browse/FLINK-9831
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sayat Satybaldiyev
>            Priority: Major
>         Attachments: flink_open_files.txt
>
>
> While running only one Flink job, which is backed by RocksDB with 
> checkpoining to HDFS we encounter an exception that TM cannot access the SST 
> file because the process has too many open files. However, we have already 
> increased the file soft/hard limit on the machine.
> Number open files for TM on the machine:
>  
> {code:java}
> lsof -p 23301|wc -l
> 8241{code}
>  
> Instance limits
>  
> {code:java}
> ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 256726
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 1048576
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited
> max user processes (-u) 128000
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>  
> {code}
>  
> [^flink_open_files.txt]
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)
>       at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:227)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedCoProcessOperator_98a16ed3228ec4a08acd8d78420516a1_(1/1) from any of the 
> 1 provided restore options.
>       at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:276)
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:132)
>       ... 5 more
> Caused by: java.io.FileNotFoundException: 
> /tmp/flink-io-3da06c9e-f619-44c9-b95f-54ee9b1a084a/job_b3ecbdc0eb2dc2dfbf5532ec1fcef9da_op_KeyedCoProcessOperator_98a16ed3228ec4a08acd8d78420516a1__1_1__uuid_c4b82a7e-8a04-4704-9e0b-393c3243cef2/3701639a-bacd-4861-99d8-5f3d112e88d6/000016.sst
>  (Too many open files)
>       at java.io.FileOutputStream.open0(Native Method)
>       at java.io.FileOutputStream.open(FileOutputStream.java:270)
>       at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>       at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
>       at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.<init>(LocalDataOutputStream.java:47)
>       at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:275)
>       at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:121)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.copyStateDataHandleData(RocksDBKeyedStateBackend.java:1008)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllDataFromStateHandles(RocksDBKeyedStateBackend.java:988)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllStateDataToDirectory(RocksDBKeyedStateBackend.java:973)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:758)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:732)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:443)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:149)
>       at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
>       at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
>       ... 7 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to