[GitHub] [flink-kubernetes-operator] mbalassi commented on pull request #91: [FLINK-26554] Upgrade Operator SDK to avoid cleanup race condition

GitBox Tue, 22 Mar 2022 03:25:53 -0700


mbalassi commented on pull request #91:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/91#issuecomment-1074995217



   Opening a ticket for this, I run into it too. I have an idea for resolving 
it.
   
   > > @gyfora It seems that the e2e tests are not stable after #84.
   > > ```
   > > Run ls e2e-tests/test_*.sh | while read script_test;do \
   > > Running e2e-tests/test_kubernetes_application_ha.sh
   > > persistentvolumeclaim/flink-example-statemachine created
   > > Error from server (InternalError): error when creating 
"e2e-tests/data/cr.yaml": Internal error occurred: failed calling webhook 
"vflinkdeployments.flink.apache.org": failed to call webhook: Post 
"https://flink-operator-webhook-service.default.svc:443/validate?timeout=10s": 
dial tcp 10.106.63.26:443: connect: connection refused
   > > Command: kubectl apply -f e2e-tests/data/cr.yaml failed. Retrying...
   > > flinkdeployment.flink.apache.org/flink-example-statemachine created
   > > persistentvolumeclaim/flink-example-statemachine unchanged
   > > Error from server (NotFound): deployments.apps 
"flink-example-statemachine" not found
   > > Command: kubectl get deploy/flink-example-statemachine failed. 
Retrying...
   > > NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
   > > flink-example-statemachine   0/1     1            0           1s
   > > deployment.apps/flink-example-statemachine condition met
   > > Waiting for jobmanager pod flink-example-statemachine-7fcf55c88b-h5r7r 
ready.
   > > pod/flink-example-statemachine-7fcf55c88b-h5r7r condition met
   > > Waiting for log "Rest endpoint listening at"...
   > > Log "Rest endpoint listening at" shows up.
   > > Waiting for log "Completed checkpoint 
[0-[9](https://github.com/apache/flink-kubernetes-operator/runs/5640468148?check_suite_focus=true#step:9:9)]+
 for job"...
   > > Log "Completed checkpoint [0-9]+ for job" shows up.
   > > Successfully verified that 
flinkdep/flink-example-statemachine.status.jobManagerDeploymentStatus is in 
READY state.
   > > Successfully verified that 
flinkdep/flink-example-statemachine.status.jobStatus.state is in RUNNING state.
   > > Kill the flink-example-statemachine-7fcf55c88b-h5r7r
   > > Defaulted container "flink-main-container" out of: flink-main-container, 
artifacts-fetcher (init)
   > > Waiting for log "Restoring job 00000000000000000000000000000000 from 
Checkpoint"...
   > > Log "Restoring job 00000000000000000000000000000000 from Checkpoint" 
shows up.
   > > Waiting for log "Completed checkpoint [0-9]+ for job"...
   > > Log "Completed checkpoint [0-9]+ for job" shows up.
   > > Status verification for 
flinkdep/flink-example-statemachine.status.jobManagerDeploymentStatus failed. 
It is DEPLOYED_NOT_READY instead of READY.
   > > Debugging failed e2e test:
   > > ```
   > 
   > > Status verification for 
flinkdep/flink-example-statemachine.status.jobManagerDeploymentStatus failed. 
It is DEPLOYED_NOT_READY instead of READY.
   > 
   > I also encountered this same error today on my private CI, re-run the job, 
it passed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-kubernetes-operator] mbalassi commented on pull request #91: [FLINK-26554] Upgrade Operator SDK to avoid cleanup race condition

Reply via email to