Gyula Fora created FLINK-4193:
---------------------------------
Summary: Task manager JVM crashes while deploying cancelling jobs
Key: FLINK-4193
URL: https://issues.apache.org/jira/browse/FLINK-4193
Project: Flink
Issue Type: Bug
Components: Streaming, TaskManager
Reporter: Gyula Fora
Priority: Critical
We have observed several TM crashes while deploying larger stateful streaming
jobs that use the RocksDB state backend.
As the JVMs crash the logs don't show anything but I have uploaded all the info
I have got from the standard output.
This indicates some GC and possibly some RocksDB issues underneath but we could
not really figure out much more.
GC segfault
https://gist.github.com/gyfora/9e56d4a0d4fc285a8d838e1b281ae125
Other crashes (maybe rocks related)
https://gist.github.com/gyfora/525c67c747873f0ff2ff2ed1682efefa
https://gist.github.com/gyfora/b93611fde87b1f2516eeaf6bfbe8d818
The third link shows 2 issues that happened in parallel...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)