[ 
https://issues.apache.org/jira/browse/FLINK-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-17089.
----------------------------------
    Resolution: Cannot Reproduce

> Checkpoint fail because RocksDBException: Error While opening a file for 
> sequentially reading
> ---------------------------------------------------------------------------------------------
>
>                 Key: FLINK-17089
>                 URL: https://issues.apache.org/jira/browse/FLINK-17089
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>            Reporter: Lu Niu
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> we use incremental rocksdb state backend. Flink job checkpoint throws 
> following exception after running for about 20 hours:
> {code:java}
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/foo/bar/usercache/xxx/appcache/application_1584397637704_9072/flink-io-4e2294f0-7e9b-4102-b079-1089f23c47aa/job_d781983f4967703b0480c7943e8100af_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__27_60__uuid_dee7e33b-9bce-42f3-909a-f6fa4ab52d8c/db/MANIFEST-000006:
>  No such file or directory       at 
> org.rocksdb.Checkpoint.createCheckpoint(Native Method)       at 
> org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51)  at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.takeDBNativeCheckpoint(RocksIncrementalSnapshotStrategy.java:249)
>  at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:160)
>      at 
> org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:126)
>  at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:439)
>         at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:411)
>        ... 17 more
> {code}
> This failure consistent happens until the job restarts.
> Some findings:
> Jobmanager log shows each time the error came from different subTask:
> {code:java}
> // grep jobManager log on appcache/application_1584397637704_9622
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme3n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-c42b6665-0170-4dc9-9933-8abd78812fd5/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__5_60__uuid_fa8124e4-1678-4555-a90a-8eec4d974a22/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme3n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-a8dfe34d-909e-4aea-8d20-c89199b20856/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__4_60__uuid_12fc9764-418e-4802-800e-3623e385743f/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme1n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-e98c35d7-586a-4edb-9eba-99c6fd823540/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__9_60__uuid_f52a3f02-aa12-4285-b594-b94e1b0f8ba7/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme3n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-a2887f93-1c75-48b1-8b67-72acdc69ce1b/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__2_60__uuid_6a8267eb-aa04-48a3-b82f-7b5b9f21c8e0/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme2n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-27e797c3-de39-4140-84e8-b94e640154cc/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__1_60__uuid_fde8b198-32d8-4e0c-a412-f316a4fe1e3e/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme1n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-e98c35d7-586a-4edb-9eba-99c6fd823540/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__9_60__uuid_f52a3f02-aa12-4285-b594-b94e1b0f8ba7/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme2n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-7be6a975-c0cd-4083-a1c3-b47e4c8fbb1b/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__13_60__uuid_d779fe65-181f-40d2-b32e-e17a023c128d/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme1n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-44fefa0f-c58a-4ce5-ac44-b8b9a436eae5/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__40_60__uuid_bfcd85f6-270b-4e56-8c09-250d9171b8a3/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme1n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-1dff583b-5fb3-4521-8cdf-261a2e3a0f4d/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__6_60__uuid_27a20e68-22d6-4e35-a23f-f267c523b829/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme2n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-27e797c3-de39-4140-84e8-b94e640154cc/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__1_60__uuid_fde8b198-32d8-4e0c-a412-f316a4fe1e3e/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme3n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-a8dfe34d-909e-4aea-8d20-c89199b20856/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__4_60__uuid_12fc9764-418e-4802-800e-3623e385743f/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme3n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-a2887f93-1c75-48b1-8b67-72acdc69ce1b/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__2_60__uuid_6a8267eb-aa04-48a3-b82f-7b5b9f21c8e0/db/MANIFEST-000006:
>  No such file or directory
> Caused by: org.rocksdb.RocksDBException: While opening a file for 
> sequentially reading: 
> /data/nvme2n1/nm-local-dir/usercache/dkapoor/appcache/application_1584397637704_9622/flink-io-27e797c3-de39-4140-84e8-b94e640154cc/job_03a4b302f44a8d9f5b31693a80bde30c_op_KeyedProcessOperator_b9daf26d7397cd4b00184cc833054139__1_60__uuid_fde8b198-32d8-4e0c-a412-f316a4fe1e3e/db/MANIFEST-000006:
>  No such file or directory
> {code}
> question:
> The state size is actually small. The largest one is ~3KB. That is actually 
> smaller state.backend.fs.memory-threshold we set. In this case, why it still 
> need to store data in rocksdb? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to