István Fajth created HDDS-6083:
----------------------------------

             Summary: Fix flakyness of tests around nodefailures
                 Key: HDDS-6083
                 URL: https://issues.apache.org/jira/browse/HDDS-6083
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: István Fajth


We haven't seen much occurance, but what we have seen a couple of times already 
is this:
{code}
Error:  
testWriteShouldSuccessIfLessThanParityNodesFail(org.apache.hadoop.ozone.client.TestOzoneECClient)
  Time elapsed: 0.116 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
{code}

Based on what I found I guess the problem can affect more things, but we have 
not seen much symptoms as we were so far lucky enough.
It seems that the problem comes from [this 
line|https://github.com/apache/ozone/blob/HDDS-3816-ec/hadoop-ozone/client/src/test/java/org/apache/hadoop/ozone/client/MultiNodePipelineBlockAllocator.java#L55].
 If we are unlucky enough, and we get the same int twice, then we will have two 
pseudo DNs in the pipeline that gets the same client assigned. Which means that 
if we declare that node to fail, we get a secondary failure during failure 
handling, and the code is not prepared for that as of now, and also we swallow 
the exception in handleOutputStreamWrite inside ECKeyOutputStream, which we use 
from the handleStripeFailure method as well as during the regular write.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to