[jira] [Updated] (FLINK-10482) java.lang.IllegalArgumentException: Negative number of in progress checkpoints

Gary Yao (JIRA) Fri, 19 Oct 2018 07:37:12 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gary Yao updated FLINK-10482:
-----------------------------
    Description: 
Recently I found the following log on my JobManager log:

{noformat}
2018-10-02 17:44:50,090 [flink-akka.actor.default-dispatcher-4117] ERROR 
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  - Implementation 
error: Unhandled exception.
 java.lang.IllegalArgumentException: Negative number of in progress checkpoints
         at 
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
         at 
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
         at 
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
         at 
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
         at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
         at 
org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
         at 
org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
         at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)       
            
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:498)                    
             
         at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
                                                                                
          
         at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
         at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
         at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) 
                                                                                
                   
         at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
                                                                                
         
         at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)  
                                                                                
                   
         at akka.actor.Actor$class.aroundReceive(Actor.scala:502)               
                                                                                
                               
         at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)        
                                                                                
                               
         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)            
       
         at akka.actor.ActorCell.invoke(ActorCell.scala:495)             
         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)             
                                                                                
                               
         at akka.dispatch.Mailbox.run(Mailbox.scala:224)    
         at akka.dispatch.Mailbox.exec(Mailbox.scala:234)                       
                                                                                
                               
         at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)            
                                                                                
                   
         at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                                                                                
                   
         at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)       
         at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{noformat}

Related: The job details don't appear, the screen shows only the skeleton, but 
no information (like the pipeline, substasks, etc).

One thing that may have caused this is that the job was failing – an uncaught 
exception on our code – and, during one of its restarts, I issued a "flink 
cancel <jobid>". The job was cancelled, but the JobManager interface took a 
very long time to put the slots as available again.

  was:
Recently I found the following log on my JobManager log:

```2018-10-02 17:44:50,090 [flink-akka.actor.default-dispatcher-4117] ERROR 
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  - Implementation 
error: Unhandled exception.
java.lang.IllegalArgumentException: Negative number of in progress checkpoints
        at 
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
        at 
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
        at 
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
        at 
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
        at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
        at 
org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
        at 
org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
        at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)        
           
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)                     
            
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
                                                                                
          
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) 
                                                                                
                   
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
                                                                                
         
        at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)  
                                                                                
                   
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)                
                                                                                
                              
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)         
                                                                                
                              
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)             
      
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)             
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)              
                                                                                
                              
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)    
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)                        
                                                                                
                              
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
                                                                                
                              
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                                                                                
                   
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)       
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
                                                                                
                  ```

 

Related: The job details don't appear, the screen shows only the skeleton, but 
no information (like the pipeline, substasks, etc).

 

One thing that may have caused this is that the job was failing – an uncaught 
exception on our code – and, during one of its restarts, I issued a "flink 
cancel <jobid>". The job was cancelled, but the JobManager interface took a 
very long time to put the slots as available again.


> java.lang.IllegalArgumentException: Negative number of in progress checkpoints
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-10482
>                 URL: https://issues.apache.org/jira/browse/FLINK-10482
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.6.1
>            Reporter: Julio Biason
>            Priority: Major
>             Fix For: 1.8.0
>
>
> Recently I found the following log on my JobManager log:
> {noformat}
> 2018-10-02 17:44:50,090 [flink-akka.actor.default-dispatcher-4117] ERROR 
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  - Implementation 
> error: Unhandled exception.
>  java.lang.IllegalArgumentException: Negative number of in progress 
> checkpoints
>          at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
>          at 
> org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72)
>          at 
> org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
>          at 
> org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
>          at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
>          at 
> org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
>          at 
> org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923)
>          at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)     
>               
>          at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>          at java.lang.reflect.Method.invoke(Method.java:498)                  
>                
>          at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
>                                                                               
>             
>          at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
>          at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
>          at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
>                                                                               
>                       
>          at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>                                                                               
>            
>          at 
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>                                                                               
>                        
>          at akka.actor.Actor$class.aroundReceive(Actor.scala:502)             
>                                                                               
>                                    
>          at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)      
>                                                                               
>                                    
>          at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)          
>          
>          at akka.actor.ActorCell.invoke(ActorCell.scala:495)             
>          at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)           
>                                                                               
>                                    
>          at akka.dispatch.Mailbox.run(Mailbox.scala:224)    
>          at akka.dispatch.Mailbox.exec(Mailbox.scala:234)                     
>                                                                               
>                                    
>          at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)          
>                                                                               
>                        
>          at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>                                                                               
>                      
>          at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)      
>  
>          at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {noformat}
> Related: The job details don't appear, the screen shows only the skeleton, 
> but no information (like the pipeline, substasks, etc).
> One thing that may have caused this is that the job was failing – an uncaught 
> exception on our code – and, during one of its restarts, I issued a "flink 
> cancel <jobid>". The job was cancelled, but the JobManager interface took a 
> very long time to put the slots as available again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-10482) java.lang.IllegalArgumentException: Negative number of in progress checkpoints

Reply via email to