[ 
https://issues.apache.org/jira/browse/YUNIKORN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PoAn Yang updated YUNIKORN-1796:
--------------------------------
    Description: 
In some CI jobs, we may get error like following in TestPredicates.
{code:java}
Unexpected error:
      <*errors.StatusError | 0xc000409860>: {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "Operation cannot be fulfilled on nodes \"yk8s-worker\": 
the object has been modified; please apply your changes to the latest version 
and try again",
              Reason: "Conflict",
              Details: {Name: "yk8s-worker", Group: "", Kind: "nodes", UID: "", 
Causes: nil, RetryAfterSeconds: 0},
              Code: 409,
          },
      }
      Operation cannot be fulfilled on nodes "yk8s-worker": the object has been 
modified; please apply your changes to the latest version and try again {code}
CI failed example: 
[https://github.com/apache/yunikorn-k8shim/actions/runs/5201213431/jobs/9381354244?pr=608]

 

I am not sure whether the root cause is that we do cleanup in the 
[defer|https://github.com/apache/yunikorn-k8shim/blob/c90673fbe5e82103e511cde9923fb09fb6988942/test/e2e/predicates/predicates_test.go#L185-L188]
 function. If it's not the reason, my thought is to use 
[retry.RetryOnConflict|https://pkg.go.dev/k8s.io/client-go/util/retry#RetryOnConflict]
 in 
[test/e2e/framework/helpers/k8s/k8s_utils.go|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/framework/helpers/k8s/k8s_utils.go].
 For example:
{code:java}
func (k *KubeCtl) RemoveNodeLabel(name, key, value string) error {
    return retry.RetryOnConflict(retry.DefaultRetry, func() error {
        node, err := k.clientSet.CoreV1().Nodes().Get(context.TODO(), name, 
metav1.GetOptions{})
        if err != nil {
            return err
        }
        delete(node.Labels, key)
        _, err = k.clientSet.CoreV1().Nodes().Update(context.TODO(), node, 
metav1.UpdateOptions{})
        if err != nil {
            return err
        }
        return nil     
    })
}{code}
 

 

  was:
In some CI jobs, we may get error like following in TestPredicates.

 
{code:java}
Unexpected error:
      <*errors.StatusError | 0xc000409860>: {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "Operation cannot be fulfilled on nodes \"yk8s-worker\": 
the object has been modified; please apply your changes to the latest version 
and try again",
              Reason: "Conflict",
              Details: {Name: "yk8s-worker", Group: "", Kind: "nodes", UID: "", 
Causes: nil, RetryAfterSeconds: 0},
              Code: 409,
          },
      }
      Operation cannot be fulfilled on nodes "yk8s-worker": the object has been 
modified; please apply your changes to the latest version and try again {code}
 

CI failed example: 
[https://github.com/apache/yunikorn-k8shim/actions/runs/5201213431/jobs/9381354244?pr=608]

I am not sure whether the root cause is that we do cleanup in the 
[defer|https://github.com/apache/yunikorn-k8shim/blob/c90673fbe5e82103e511cde9923fb09fb6988942/test/e2e/predicates/predicates_test.go#L185-L188]
 function. If it's not the reason, my thought is to use 
[retry.RetryOnConflict|https://pkg.go.dev/k8s.io/client-go/util/retry#RetryOnConflict]
 in 
[test/e2e/framework/helpers/k8s/k8s_utils.go|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/framework/helpers/k8s/k8s_utils.go].
 For example:

 
{code:java}
func (k *KubeCtl) RemoveNodeLabel(name, key, value string) error {
    return retry.RetryOnConflict(retry.DefaultRetry, func() error {             
  node, err := k.clientSet.CoreV1().Nodes().Get(context.TODO(), name, 
metav1.GetOptions{})
        if err != nil {
            return err
        }
        delete(node.Labels, key)
        _, err = k.clientSet.CoreV1().Nodes().Update(context.TODO(), node, 
metav1.UpdateOptions{})
        if err != nil {
            return err
        }
        return nil     
    })
}{code}
 

 


> Some e2e test cases update a modified resource and get conflict error
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-1796
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1796
>             Project: Apache YuniKorn
>          Issue Type: Test
>          Components: test - e2e
>            Reporter: PoAn Yang
>            Assignee: PoAn Yang
>            Priority: Minor
>              Labels: flaky-test
>
> In some CI jobs, we may get error like following in TestPredicates.
> {code:java}
> Unexpected error:
>       <*errors.StatusError | 0xc000409860>: {
>           ErrStatus: {
>               TypeMeta: {Kind: "", APIVersion: ""},
>               ListMeta: {
>                   SelfLink: "",
>                   ResourceVersion: "",
>                   Continue: "",
>                   RemainingItemCount: nil,
>               },
>               Status: "Failure",
>               Message: "Operation cannot be fulfilled on nodes 
> \"yk8s-worker\": the object has been modified; please apply your changes to 
> the latest version and try again",
>               Reason: "Conflict",
>               Details: {Name: "yk8s-worker", Group: "", Kind: "nodes", UID: 
> "", Causes: nil, RetryAfterSeconds: 0},
>               Code: 409,
>           },
>       }
>       Operation cannot be fulfilled on nodes "yk8s-worker": the object has 
> been modified; please apply your changes to the latest version and try again 
> {code}
> CI failed example: 
> [https://github.com/apache/yunikorn-k8shim/actions/runs/5201213431/jobs/9381354244?pr=608]
>  
> I am not sure whether the root cause is that we do cleanup in the 
> [defer|https://github.com/apache/yunikorn-k8shim/blob/c90673fbe5e82103e511cde9923fb09fb6988942/test/e2e/predicates/predicates_test.go#L185-L188]
>  function. If it's not the reason, my thought is to use 
> [retry.RetryOnConflict|https://pkg.go.dev/k8s.io/client-go/util/retry#RetryOnConflict]
>  in 
> [test/e2e/framework/helpers/k8s/k8s_utils.go|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/framework/helpers/k8s/k8s_utils.go].
>  For example:
> {code:java}
> func (k *KubeCtl) RemoveNodeLabel(name, key, value string) error {
>     return retry.RetryOnConflict(retry.DefaultRetry, func() error {
>         node, err := k.clientSet.CoreV1().Nodes().Get(context.TODO(), name, 
> metav1.GetOptions{})
>         if err != nil {
>             return err
>         }
>         delete(node.Labels, key)
>         _, err = k.clientSet.CoreV1().Nodes().Update(context.TODO(), node, 
> metav1.UpdateOptions{})
>         if err != nil {
>             return err
>         }
>         return nil     
>     })
> }{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to