[
https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372824#comment-16372824
]
tarun razdan commented on FLINK-7756:
-------------------------------------
[~aljoscha]
For branch:
https://github.com/aljoscha/flink/commits/fix-flink-cep-serialization
Build using the command:
{code}
mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true
-Dscala.version=2.11.7 -Pvendor-repos -Dhadoop.version=2.7.3.2.6.2.0-205
{code}
Submit the Job using the command:
{code}
/opt/flink/bin/flink run -yn 1 -ys 1 -ynm 'Flink1.5-SNAPSHOT' -ytm 10000 -yst
-p 1 -d -m yarn-cluster
/home/flink/<path_name>/target/scala-2.11/appname-assembly-0.2.jar
{code}
Got the error:
{code}
2018-02-22 13:41:01,726 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint
triggering task Source: Kafka source (1/1) is not being executed at the moment.
Aborting checkpoint.
2018-02-22 13:41:01,901 DEBUG
org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler - Received
request /jobs/overview.
2018-02-22 13:41:02,726 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint
triggering task Source: Kafka source (1/1) is not being executed at the moment.
Aborting checkpoint.
2018-02-22 13:41:03,217 DEBUG
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Received
request /jobs/ec1c9d7a3c413a9523656efa58735009.
2018-02-22 13:41:03,551 DEBUG
org.apache.flink.runtime.rest.handler.job.JobConfigHandler - Received
request /jobs/ec1c9d7a3c413a9523656efa58735009/config.
2018-02-22 13:41:03,726 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint
triggering task Source: Kafka source (1/1) is not being executed at the moment.
Aborting checkpoint.
2018-02-22 13:41:03,754 DEBUG org.apache.flink.yarn.YarnResourceManager
- Trigger heartbeat request.
2018-02-22 13:41:03,754 DEBUG org.apache.flink.yarn.YarnResourceManager
- Trigger heartbeat request.
2018-02-22 13:41:03,755 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Received heartbeat request from 9b7af57d9d7f60b130fcb1410028cbc2.
2018-02-22 13:41:03,755 DEBUG org.apache.flink.yarn.YarnResourceManager
- Received heartbeat from 3c5b0b430d0970400e034520dab1e4a7.
2018-02-22 13:41:03,773 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/b03dc620ec47653e3e9b89840a81262b/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,777 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/d360e5fe1221eb05c6ad2692a61f584c/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,789 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/c9909fb2995f41e689ae0e88542e7359/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,795 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/226be2828751319467b95130ce97d658/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,803 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/8e0a56963945030900eeabecc19da6a5/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,901 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/8ae9fdc9d3d78f4ff11d2aaf7ad184d6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,905 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/7ac0bc9a0d80b24acdc357d40edba272/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,919 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/91ce640b5736d675979e08d6aedbae68/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,926 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/8ddb02847bc5de346de559f035cd6a0f/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,937 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/43aeb457a153712425af98325421d363/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,016 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/ded95c643b42f31cf882a8986207fd30/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,048 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/eec5890dac9c38f66954443809beb5b0/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,052 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a964ee72788c82cb7d15e352d9a94f6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,079 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/1d9c83f6e1879fdbe461aafac16eb8a5/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,085 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/4063620891a151092c5bcedb218870a6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,094 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a751c66e0e32aee2cd8120a1a72a4d6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,142 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/37ecc85b429bd08d0fd539532055e117/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,173 DEBUG
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler -
Received request
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/20e20298680571979f690d36d1a6db36/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,184 DEBUG org.apache.flink.runtime.jobmaster.JobMaster
- Trigger heartbeat request.
2018-02-22 13:41:04,726 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint
triggering task Source: Kafka source (1/1) is not being executed at the moment.
Aborting checkpoint.
2018-02-22 13:41:04,735 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job compname
productname Events (ec1c9d7a3c413a9523656efa58735009) switched from state
RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate all requires slots within timeout of 300000 ms. Slots
required: 18, slots allocated: 0
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$0(ExecutionGraph.java:940)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-22 13:41:04,737 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Could not find
slot c6b13a74527235e678669788072130cc in slot sharing group
d29efbc3b08c80219225d0c14087a401. Ignoring release slot request.
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
... 8 more
2018-02-22 13:41:04,736 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotPool - There is no
allocated slot with allocation id bd441228d8d4ae8e91703a8b641bfc4d. Ignoring
the release slot request.
java.util.concurrent.TimeoutException
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-22 13:41:04,738 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Could not find
slot 6d93cc4406100d6ef57436dbbe910d51 in slot sharing group
d29efbc3b08c80219225d0c14087a401. Ignoring release slot request.
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
... 8 more
2018-02-22 13:41:04,739 DEBUG
org.apache.flink.runtime.jobmaster.slotpool.SlotPool - There is no
allocated slot with allocation id bd441228d8d4ae8e91703a8b641bfc4d. Ignoring
the release slot request.
java.util.concurrent.TimeoutException
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-22 13:41:04,747 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Kafka
source (1/1) (627e749d248cdeabf3a4cc1632cf05fb) switched from SCHEDULED to
CANCELED.
2018-02-22 13:41:04,748 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Kafka
Control -> kafka status flatmap (1/1) (7ca7afa227eaca83fe2c3203ef0e63a1)
switched from SCHEDULED to CANCELED.
2018-02-22 13:41:04,748 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - successful
order CEP (1/1) (10fe69d5929519da28db7b9cc87300e7) switched from SCHEDULED to
CANCELED.
2018-02-22 13:41:04,748 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph - First
Transaction State (1/1) (d26b0a871d0f2682ed608d98343f4bff) switched from
SCHEDULED to CANCELED.
{code}
> RocksDB state backend Checkpointing (Async and Incremental) is not working
> with CEP.
> -------------------------------------------------------------------------------------
>
> Key: FLINK-7756
> URL: https://issues.apache.org/jira/browse/FLINK-7756
> Project: Flink
> Issue Type: Sub-task
> Components: CEP, State Backends, Checkpointing, Streaming
> Affects Versions: 1.4.0, 1.3.2
> Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend
> Reporter: Shashank Agarwal
> Assignee: Aljoscha Krettek
> Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
> Attachments: jobmanager.log, jobmanager_without_cassandra.log,
> taskmanager.log, taskmanager_without_cassandra.log
>
>
> When i try to use RocksDBStateBackend on my staging cluster (which is using
> HDFS as file system) it crashes. But When i use FsStateBackend on staging
> (which is using HDFS as file system) it is working fine.
> On local with local file system it's working fine in both cases.
> Please check attached logs. I have around 20-25 tasks in my app.
> {code:java}
> 2017-09-29 14:21:31,639 INFO
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state
> to restore for the BucketingSink (taskIdx=0).
> 2017-09-29 14:21:31,640 INFO
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend -
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,020 INFO
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state
> to restore for the BucketingSink (taskIdx=1).
> 2017-09-29 14:21:32,022 INFO
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend -
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil
> - Found Netty's native epoll transport in the classpath, using
> it
> 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Co-Flat Map (1/2)
> (b879f192c4e8aae6671cdafb3a24c00a).
> 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Map (2/2)
> (1ea5aef6ccc7031edc6b37da2912d90b).
> 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Co-Flat Map (2/2)
> (4bac8e764c67520d418a4c755be23d4d).
> 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task
> - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched
> from RUNNING to FAILED.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2
> for operator Co-Flat Map (1/2).}
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for
> operator Co-Flat Map (1/2).
> ... 6 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.IllegalStateException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
> ... 5 more
> Suppressed: java.lang.Exception: Could not properly cancel managed
> keyed state future.
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961)
> ... 5 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.IllegalStateException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
> at
> org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
> ... 7 more
> Caused by: java.lang.IllegalStateException
> at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:878)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
> ... 5 more
> [CIRCULAR REFERENCE:java.lang.IllegalStateException]
> 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Map (1/2)
> (a06925261e74b4efdf50a30089e2b778).
> 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Map (1/2)
> (1747902c96e63fefd977ac4d4a01d2fa).
> 2017-09-29 14:21:34,180 INFO org.apache.flink.runtime.taskmanager.Task
> - Map (1/2) (a06925261e74b4efdf50a30089e2b778) switched from
> RUNNING to FAILED.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2
> for operator Map (1/2).}
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for
> operator Map (1/2).
> ... 6 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.IllegalStateException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
> ... 5 more
> Suppressed: java.lang.Exception: Could not properly cancel managed
> keyed state future.
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961)
> ... 5 more
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.IllegalStateException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
> at
> org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
> at
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
> ... 7 more
> Caused by: java.lang.IllegalStateException
> at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:878)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
> ... 5 more
> [CIRCULAR REFERENCE:java.lang.IllegalStateException]
> {code}
> That same printed for around 12-13 tasks. Than following logs printed :
> {code:java}
> 2017-09-29 14:21:35,039 INFO org.apache.flink.runtime.taskmanager.Task
> - Ensuring all FileSystem streams are closed for task Source:
> Custom Source (2/2) (77c896e2a2063e98f399244cae21c260) [CANCELED]
> 2017-09-29 14:21:35,041 WARN org.apache.hadoop.ipc.Client
> - interrupted waiting to send rpc request to server
> java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1059)
> at org.apache.hadoop.ipc.Client.call(Client.java:1454)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy12.delete(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.delete(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
> at
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.delete(HadoopFileSystem.java:435)
> at
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.delete(SafetyNetWrapperFileSystem.java:106)
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:324)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeMetaData(RocksDBKeyedStateBackend.java:826)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:875)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2017-09-29 14:21:35,042 WARN
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory - Could
> not delete the checkpoint stream file
> hdfs://static.175.87.9.5.clients.your-server.de:8020/flink/flink-checkpoints/rocksDB/events/e10dbe09aa2ecccb22737ddce8b4dc9f/chk-2/a28796de-978a-4f1a-8ff5-5f5c654b0ffc.
> java.io.IOException: java.lang.InterruptedException
> at org.apache.hadoop.ipc.Client.call(Client.java:1460)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy12.delete(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.delete(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
> at
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.delete(HadoopFileSystem.java:435)
> at
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.delete(SafetyNetWrapperFileSystem.java:106)
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:324)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeMetaData(RocksDBKeyedStateBackend.java:826)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:875)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
> at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1059)
> at org.apache.hadoop.ipc.Client.call(Client.java:1454)
> ... 31 more
> 2017-09-29 14:21:35,054 INFO org.apache.flink.runtime.taskmanager.Task
> - Attempting to cancel task KeyedCEPPatternOperator -> Flat Map
> -> (Flat Map, Flat Map) (1/2) (8c6eff62d47c4a624a7554065bac36ee).
> 2017-09-29 14:21:35,055 INFO org.apache.flink.runtime.taskmanager.Task
> - KeyedCEPPatternOperator -> Flat Map -> (Flat Map, Flat Map)
> (1/2) (8c6eff62d47c4a624a7554065bac36ee) switched from RUNNING to CANCELING.
> {code}
> Than same printed for 12-13 tasks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)