[
https://issues.apache.org/jira/browse/HDDS-1908?focusedWorklogId=293280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-293280
]
ASF GitHub Bot logged work on HDDS-1908:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Aug/19 17:42
Start Date: 12/Aug/19 17:42
Worklog Time Spent: 10m
Work Description: adoroszlai commented on pull request #1282: HDDS-1908.
TestMultiBlockWritesWithDnFailures is failing
URL: https://github.com/apache/hadoop/pull/1282
## What changes were proposed in this pull request?
Multi-block writes tests are failing most of the time because Ratis leader
election timeout is about the same length as the client retry timeout (5 times
1 second). This frequently caused an entire pipeline to be excluded (by
`KeyOutputStream.handleException`) just because client gives up before leader
is elected. There are only 6 nodes in TestMultiBlockWritesWithDnFailures test,
2 of which is shut down as part of the test. Thus, if this happens, subsequent
write fails because new block cannot be allocated.
This change decreases leader election timeout and increases client retries.
It is basically an extension of
[HDDS-1780](https://issues.apache.org/jira/browse/HDDS-1780).
Additional changes:
* move `testMultiBlockWritesWithIntermittentDnFailures` to
`TestMultiBlockWritesWithDnFailures`
* remove unused `maxRetries` member
* call cluster `shutdown()` regardless of test success/failure (see also
[HDDS-1949](https://issues.apache.org/jira/browse/HDDS-1949))
https://issues.apache.org/jira/browse/HDDS-1908
## How was this patch tested?
Ran both test classes 10+ times, without any intermittent failure.
```
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
157.086 s - in org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
[INFO] Running
org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
75.308 s - in
org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 293280)
Time Spent: 10m
Remaining Estimate: 0h
> TestMultiBlockWritesWithDnFailures is failing
> ---------------------------------------------
>
> Key: HDDS-1908
> URL: https://issues.apache.org/jira/browse/HDDS-1908
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: test
> Reporter: Nanda kumar
> Assignee: Doroszlai, Attila
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> TestMultiBlockWritesWithDnFailures is failing with the following exception
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 30.992 s <<< FAILURE! - in
> org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
> [ERROR]
> testMultiBlockWritesWithDnFailures(org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures)
> Time elapsed: 30.941 s <<< ERROR!
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0
> blocks. Requested 1 blocks
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:720)
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:752)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:248)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:296)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:201)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:376)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:325)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:231)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
> at
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at
> org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures.testMultiBlockWritesWithDnFailures(TestMultiBlockWritesWithDnFailures.java:144)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]