István Fajth created HDDS-6083:
----------------------------------
Summary: Fix flakyness of tests around nodefailures
Key: HDDS-6083
URL: https://issues.apache.org/jira/browse/HDDS-6083
Project: Apache Ozone
Issue Type: Sub-task
Reporter: István Fajth
We haven't seen much occurance, but what we have seen a couple of times already
is this:
{code}
Error:
testWriteShouldSuccessIfLessThanParityNodesFail(org.apache.hadoop.ozone.client.TestOzoneECClient)
Time elapsed: 0.116 s <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
{code}
Based on what I found I guess the problem can affect more things, but we have
not seen much symptoms as we were so far lucky enough.
It seems that the problem comes from [this
line|https://github.com/apache/ozone/blob/HDDS-3816-ec/hadoop-ozone/client/src/test/java/org/apache/hadoop/ozone/client/MultiNodePipelineBlockAllocator.java#L55].
If we are unlucky enough, and we get the same int twice, then we will have two
pseudo DNs in the pipeline that gets the same client assigned. Which means that
if we declare that node to fail, we get a secondary failure during failure
handling, and the code is not prepared for that as of now, and also we swallow
the exception in handleOutputStreamWrite inside ECKeyOutputStream, which we use
from the handleStripeFailure method as well as during the regular write.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]