PoAn Yang created YUNIKORN-1796:
-----------------------------------

             Summary: Some e2e test cases update a modified resource and get 
conflict error
                 Key: YUNIKORN-1796
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1796
             Project: Apache YuniKorn
          Issue Type: Test
          Components: test - e2e
            Reporter: PoAn Yang
            Assignee: PoAn Yang


In some CI jobs, we may get error like following in TestPredicates.

 
{code:java}
Unexpected error:
      <*errors.StatusError | 0xc000409860>: {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "Operation cannot be fulfilled on nodes \"yk8s-worker\": 
the object has been modified; please apply your changes to the latest version 
and try again",
              Reason: "Conflict",
              Details: {Name: "yk8s-worker", Group: "", Kind: "nodes", UID: "", 
Causes: nil, RetryAfterSeconds: 0},
              Code: 409,
          },
      }
      Operation cannot be fulfilled on nodes "yk8s-worker": the object has been 
modified; please apply your changes to the latest version and try again {code}
 

CI failed example: 
[https://github.com/apache/yunikorn-k8shim/actions/runs/5201213431/jobs/9381354244?pr=608]

I am not sure whether the root cause is that we do cleanup in the 
[defer|https://github.com/apache/yunikorn-k8shim/blob/c90673fbe5e82103e511cde9923fb09fb6988942/test/e2e/predicates/predicates_test.go#L185-L188]
 function. If it's not the reason, my thought is to use 
[retry.RetryOnConflict|https://pkg.go.dev/k8s.io/client-go/util/retry#RetryOnConflict]
 in 
[test/e2e/framework/helpers/k8s/k8s_utils.go|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/framework/helpers/k8s/k8s_utils.go].
 For example:

 
{code:java}
func (k *KubeCtl) RemoveNodeLabel(name, key, value string) error {
    return retry.RetryOnConflict(retry.DefaultRetry, func() error {             
  node, err := k.clientSet.CoreV1().Nodes().Get(context.TODO(), name, 
metav1.GetOptions{})
        if err != nil {
            return err
        }
        delete(node.Labels, key)
        _, err = k.clientSet.CoreV1().Nodes().Update(context.TODO(), node, 
metav1.UpdateOptions{})
        if err != nil {
            return err
        }
        return nil     
    })
}{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to