Robert Metzger created FLINK-5463:
-------------------------------------
Summary: RocksDB.disposeInternal does not react to interrupts,
blocks task cancellation
Key: FLINK-5463
URL: https://issues.apache.org/jira/browse/FLINK-5463
Project: Flink
Issue Type: Bug
Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Robert Metzger
I'm using Flink 699f4b0.
My Flink job is slow while cancelling because RockDB seems to be busy with
disposing its state:
{code}
2017-01-11 18:48:23,315 INFO org.apache.flink.runtime.taskmanager.Task
- Triggering cancellation of task code
TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071
}, EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)
(2accc6ca2727c4f7ec963318fbd237e9).
2017-01-11 18:48:53,318 WARN org.apache.flink.runtime.taskmanager.Task
- Task 'TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), Windowed
Stream.apply(AllWindowedStream.java:440)) (1/1)' did not react to cancelling
signal, but is stuck in method:
org.rocksdb.RocksDB.disposeInternal(Native Method)
org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56)
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250)
org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331)
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169)
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273)
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
java.lang.Thread.run(Thread.java:745)
2017-01-11 18:48:53,319 WARN org.apache.flink.runtime.taskmanager.Task
- Task 'TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)'
did not react to cancelling signal, but is stuck in method:
org.rocksdb.RocksDB.disposeInternal(Native Method)
org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56)
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250)
org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331)
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169)
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273)
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
java.lang.Thread.run(Thread.java:745)
2017-01-11 18:49:23,319 WARN org.apache.flink.runtime.taskmanager.Task
- Task 'TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)'
did not react to cancelling signal, but is stuck in method:
org.rocksdb.RocksDB.disposeInternal(Native Method)
org.rocksdb.RocksObject.disposeInternal(RocksObject.java:37)
org.rocksdb.AbstractImmutableNativeReference.close(AbstractImmutableNativeReference.java:56)
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.dispose(RocksDBKeyedStateBackend.java:250)
org.apache.flink.streaming.api.operators.AbstractStreamOperator.dispose(AbstractStreamOperator.java:331)
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:169)
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.dispose(WindowOperator.java:273)
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:439)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:340)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
java.lang.Thread.run(Thread.java:745)
2017-01-11 18:49:50,080 INFO org.apache.flink.runtime.taskmanager.Task
- Freeing task resources for
TriggerWindow(TumblingEventTimeWindows(4),
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@2edcd071},
EventTimeTrigger(), WindowedStream.apply(AllWindowedStream.java:440)) (1/1)
(2accc6ca2727c4f7ec963318fbd237e9)
{code}
I'm filing this issue because I didn't see such a behavior in Flink 1.1. I
guess Flink's code should be well behaved when it comes to cancelling tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)