[ 
https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372824#comment-16372824
 ] 

tarun razdan commented on FLINK-7756:
-------------------------------------

[~aljoscha]

For branch: 
https://github.com/aljoscha/flink/commits/fix-flink-cep-serialization

Build using the command:
{code}
mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true 
-Dscala.version=2.11.7 -Pvendor-repos -Dhadoop.version=2.7.3.2.6.2.0-205
{code}

Submit the Job using the command:
{code}
/opt/flink/bin/flink run -yn 1 -ys 1 -ynm 'Flink1.5-SNAPSHOT' -ytm 10000 -yst 
-p 1 -d -m yarn-cluster 
/home/flink/<path_name>/target/scala-2.11/appname-assembly-0.2.jar
{code}

Got the error:

{code}
2018-02-22 13:41:01,726 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint 
triggering task Source: Kafka source (1/1) is not being executed at the moment. 
Aborting checkpoint.
2018-02-22 13:41:01,901 DEBUG 
org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Received 
request /jobs/overview.
2018-02-22 13:41:02,726 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint 
triggering task Source: Kafka source (1/1) is not being executed at the moment. 
Aborting checkpoint.
2018-02-22 13:41:03,217 DEBUG 
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler   - Received 
request /jobs/ec1c9d7a3c413a9523656efa58735009.
2018-02-22 13:41:03,551 DEBUG 
org.apache.flink.runtime.rest.handler.job.JobConfigHandler    - Received 
request /jobs/ec1c9d7a3c413a9523656efa58735009/config.
2018-02-22 13:41:03,726 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint 
triggering task Source: Kafka source (1/1) is not being executed at the moment. 
Aborting checkpoint.
2018-02-22 13:41:03,754 DEBUG org.apache.flink.yarn.YarnResourceManager         
            - Trigger heartbeat request.
2018-02-22 13:41:03,754 DEBUG org.apache.flink.yarn.YarnResourceManager         
            - Trigger heartbeat request.
2018-02-22 13:41:03,755 DEBUG org.apache.flink.runtime.jobmaster.JobMaster      
            - Received heartbeat request from 9b7af57d9d7f60b130fcb1410028cbc2.
2018-02-22 13:41:03,755 DEBUG org.apache.flink.yarn.YarnResourceManager         
            - Received heartbeat from 3c5b0b430d0970400e034520dab1e4a7.
2018-02-22 13:41:03,773 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/b03dc620ec47653e3e9b89840a81262b/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,777 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/d360e5fe1221eb05c6ad2692a61f584c/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,789 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/c9909fb2995f41e689ae0e88542e7359/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,795 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/226be2828751319467b95130ce97d658/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,803 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/8e0a56963945030900eeabecc19da6a5/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,901 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/8ae9fdc9d3d78f4ff11d2aaf7ad184d6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,905 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/7ac0bc9a0d80b24acdc357d40edba272/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,919 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/91ce640b5736d675979e08d6aedbae68/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,926 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/8ddb02847bc5de346de559f035cd6a0f/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:03,937 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/43aeb457a153712425af98325421d363/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,016 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/ded95c643b42f31cf882a8986207fd30/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,048 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/eec5890dac9c38f66954443809beb5b0/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,052 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a964ee72788c82cb7d15e352d9a94f6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,079 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/1d9c83f6e1879fdbe461aafac16eb8a5/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,085 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/4063620891a151092c5bcedb218870a6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,094 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/2a751c66e0e32aee2cd8120a1a72a4d6/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,142 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/37ecc85b429bd08d0fd539532055e117/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,173 DEBUG 
org.apache.flink.runtime.rest.handler.job.metrics.JobVertexMetricsHandler  - 
Received request 
/jobs/ec1c9d7a3c413a9523656efa58735009/vertices/20e20298680571979f690d36d1a6db36/metrics?get=0.currentLowWatermark.
2018-02-22 13:41:04,184 DEBUG org.apache.flink.runtime.jobmaster.JobMaster      
            - Trigger heartbeat request.
2018-02-22 13:41:04,726 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint 
triggering task Source: Kafka source (1/1) is not being executed at the moment. 
Aborting checkpoint.
2018-02-22 13:41:04,735 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job compname 
productname Events (ec1c9d7a3c413a9523656efa58735009) switched from state 
RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Could not allocate all requires slots within timeout of 300000 ms. Slots 
required: 18, slots allocated: 0
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$0(ExecutionGraph.java:940)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-02-22 13:41:04,737 DEBUG 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Could not find 
slot c6b13a74527235e678669788072130cc in slot sharing group 
d29efbc3b08c80219225d0c14087a401. Ignoring release slot request.
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
        ... 8 more
2018-02-22 13:41:04,736 DEBUG 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - There is no 
allocated slot with allocation id bd441228d8d4ae8e91703a8b641bfc4d. Ignoring 
the release slot request.
java.util.concurrent.TimeoutException
        at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-02-22 13:41:04,738 DEBUG 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - Could not find 
slot 6d93cc4406100d6ef57436dbbe910d51 in slot sharing group 
d29efbc3b08c80219225d0c14087a401. Ignoring release slot request.
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
        ... 8 more
2018-02-22 13:41:04,739 DEBUG 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool          - There is no 
allocated slot with allocation id bd441228d8d4ae8e91703a8b641bfc4d. Ignoring 
the release slot request.
java.util.concurrent.TimeoutException
        at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:519)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-02-22 13:41:04,747 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Kafka 
source (1/1) (627e749d248cdeabf3a4cc1632cf05fb) switched from SCHEDULED to 
CANCELED.
2018-02-22 13:41:04,748 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Kafka 
Control -> kafka status flatmap (1/1) (7ca7afa227eaca83fe2c3203ef0e63a1) 
switched from SCHEDULED to CANCELED.
2018-02-22 13:41:04,748 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph        - successful 
order CEP (1/1) (10fe69d5929519da28db7b9cc87300e7) switched from SCHEDULED to 
CANCELED.
2018-02-22 13:41:04,748 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph        - First 
Transaction State (1/1) (d26b0a871d0f2682ed608d98343f4bff) switched from 
SCHEDULED to CANCELED.
{code}


> RocksDB state backend Checkpointing (Async and Incremental)  is not working 
> with CEP.
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-7756
>                 URL: https://issues.apache.org/jira/browse/FLINK-7756
>             Project: Flink
>          Issue Type: Sub-task
>          Components: CEP, State Backends, Checkpointing, Streaming
>    Affects Versions: 1.4.0, 1.3.2
>         Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend
>            Reporter: Shashank Agarwal
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.5.0, 1.4.1
>
>         Attachments: jobmanager.log, jobmanager_without_cassandra.log, 
> taskmanager.log, taskmanager_without_cassandra.log
>
>
> When i try to use RocksDBStateBackend on my staging cluster (which is using 
> HDFS as file system) it crashes. But When i use FsStateBackend on staging 
> (which is using HDFS as file system) it is working fine.
> On local with local file system it's working fine in both cases.
> Please check attached logs. I have around 20-25 tasks in my app.
> {code:java}
> 2017-09-29 14:21:31,639 INFO  
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink  - No state 
> to restore for the BucketingSink (taskIdx=0).
> 2017-09-29 14:21:31,640 INFO  
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - 
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,020 INFO  
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink  - No state 
> to restore for the BucketingSink (taskIdx=1).
> 2017-09-29 14:21:32,022 INFO  
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - 
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,078 INFO  com.datastax.driver.core.NettyUtil              
>               - Found Netty's native epoll transport in the classpath, using 
> it
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Attempting to fail task externally Co-Flat Map (1/2) 
> (b879f192c4e8aae6671cdafb3a24c00a).
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Attempting to fail task externally Map (2/2) 
> (1ea5aef6ccc7031edc6b37da2912d90b).
> 2017-09-29 14:21:34,178 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Attempting to fail task externally Co-Flat Map (2/2) 
> (4bac8e764c67520d418a4c755be23d4d).
> 2017-09-29 14:21:34,178 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched 
> from RUNNING to FAILED.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Co-Flat Map (1/2).}
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Co-Flat Map (1/2).
>       ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>       ... 5 more
>       Suppressed: java.lang.Exception: Could not properly cancel managed 
> keyed state future.
>               at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90)
>               at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023)
>               at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961)
>               ... 5 more
>       Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>               at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>               at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>               at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>               at 
> org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
>               at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
>               ... 7 more
>       Caused by: java.lang.IllegalStateException
>               at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
>               at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:878)
>               at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
>               at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
>               at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>               at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
>               at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>               ... 5 more
>       [CIRCULAR REFERENCE:java.lang.IllegalStateException]
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Attempting to fail task externally Map (1/2) 
> (a06925261e74b4efdf50a30089e2b778).
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Attempting to fail task externally Map (1/2) 
> (1747902c96e63fefd977ac4d4a01d2fa).
> 2017-09-29 14:21:34,180 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Map (1/2) (a06925261e74b4efdf50a30089e2b778) switched from 
> RUNNING to FAILED.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Map (1/2).}
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Map (1/2).
>       ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>       ... 5 more
>       Suppressed: java.lang.Exception: Could not properly cancel managed 
> keyed state future.
>               at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90)
>               at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023)
>               at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961)
>               ... 5 more
>       Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>               at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>               at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>               at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>               at 
> org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:85)
>               at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:88)
>               ... 7 more
>       Caused by: java.lang.IllegalStateException
>               at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
>               at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:878)
>               at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
>               at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
>               at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>               at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
>               at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>               ... 5 more
>       [CIRCULAR REFERENCE:java.lang.IllegalStateException]
> {code}
> That same printed for around 12-13 tasks. Than following logs printed :
> {code:java}
> 2017-09-29 14:21:35,039 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Ensuring all FileSystem streams are closed for task Source: 
> Custom Source (2/2) (77c896e2a2063e98f399244cae21c260) [CANCELED]
> 2017-09-29 14:21:35,041 WARN  org.apache.hadoop.ipc.Client                    
>               - interrupted waiting to send rpc request to server
> java.lang.InterruptedException
>       at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1059)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy12.delete(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy13.delete(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
>       at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.delete(HadoopFileSystem.java:435)
>       at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.delete(SafetyNetWrapperFileSystem.java:106)
>       at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:324)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeMetaData(RocksDBKeyedStateBackend.java:826)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:875)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> 2017-09-29 14:21:35,042 WARN  
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory  - Could 
> not delete the checkpoint stream file 
> hdfs://static.175.87.9.5.clients.your-server.de:8020/flink/flink-checkpoints/rocksDB/events/e10dbe09aa2ecccb22737ddce8b4dc9f/chk-2/a28796de-978a-4f1a-8ff5-5f5c654b0ffc.
> java.io.IOException: java.lang.InterruptedException
>       at org.apache.hadoop.ipc.Client.call(Client.java:1460)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy12.delete(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy13.delete(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
>       at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.delete(HadoopFileSystem.java:435)
>       at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.delete(SafetyNetWrapperFileSystem.java:106)
>       at 
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:324)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeMetaData(RocksDBKeyedStateBackend.java:826)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:875)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:353)
>       at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:350)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
>       at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1059)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>       ... 31 more
> 2017-09-29 14:21:35,054 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Attempting to cancel task KeyedCEPPatternOperator -> Flat Map 
> -> (Flat Map, Flat Map) (1/2) (8c6eff62d47c4a624a7554065bac36ee).
> 2017-09-29 14:21:35,055 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - KeyedCEPPatternOperator -> Flat Map -> (Flat Map, Flat Map) 
> (1/2) (8c6eff62d47c4a624a7554065bac36ee) switched from RUNNING to CANCELING.
> {code}
> Than same printed for 12-13 tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to