[ 
https://issues.apache.org/jira/browse/HDDS-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

István Fajth updated HDDS-6083:
-------------------------------
    Description: 
We haven't seen much occurance, but what we have seen a couple of times already 
is this failure:
{code}
Error:  
testWriteShouldSuccessIfLessThanParityNodesFail(org.apache.hadoop.ozone.client.TestOzoneECClient)
  Time elapsed: 0.116 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
{code}


  was:
We haven't seen much occurance, but what we have seen a couple of times already 
is this:
{code}
Error:  
testWriteShouldSuccessIfLessThanParityNodesFail(org.apache.hadoop.ozone.client.TestOzoneECClient)
  Time elapsed: 0.116 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
{code}

Based on what I found I guess the problem can affect more things, but we have 
not seen much symptoms as we were so far lucky enough.
It seems that the problem comes from [this 
line|https://github.com/apache/ozone/blob/HDDS-3816-ec/hadoop-ozone/client/src/test/java/org/apache/hadoop/ozone/client/MultiNodePipelineBlockAllocator.java#L55].
 If we are unlucky enough, and we get the same int twice, then we will have two 
pseudo DNs in the pipeline that gets the same MockDNStorage assigned. Which 
means that if we declare that node to fail, we get a secondary failure during 
failure handling, and the code is not prepared for that as of now, and also we 
swallow the exception in handleOutputStreamWrite inside ECKeyOutputStream, 
which we use from the handleStripeFailure method as well as during the regular 
write.


> Fix flakyness of tests around nodefailures
> ------------------------------------------
>
>                 Key: HDDS-6083
>                 URL: https://issues.apache.org/jira/browse/HDDS-6083
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: István Fajth
>            Assignee: István Fajth
>            Priority: Major
>              Labels: pull-request-available
>
> We haven't seen much occurance, but what we have seen a couple of times 
> already is this failure:
> {code}
> Error:  
> testWriteShouldSuccessIfLessThanParityNodesFail(org.apache.hadoop.ozone.client.TestOzoneECClient)
>   Time elapsed: 0.116 s  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to