[ 
https://issues.apache.org/jira/browse/HDDS-1908?focusedWorklogId=293280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-293280
 ]

ASF GitHub Bot logged work on HDDS-1908:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Aug/19 17:42
            Start Date: 12/Aug/19 17:42
    Worklog Time Spent: 10m 
      Work Description: adoroszlai commented on pull request #1282: HDDS-1908. 
TestMultiBlockWritesWithDnFailures is failing
URL: https://github.com/apache/hadoop/pull/1282
 
 
   ## What changes were proposed in this pull request?
   
   Multi-block writes tests are failing most of the time because Ratis leader 
election timeout is about the same length as the client retry timeout (5 times 
1 second).  This frequently caused an entire pipeline to be excluded (by 
`KeyOutputStream.handleException`) just because client gives up before leader 
is elected.  There are only 6 nodes in TestMultiBlockWritesWithDnFailures test, 
2 of which is shut down as part of the test.  Thus, if this happens, subsequent 
write fails because new block cannot be allocated.
   
   This change decreases leader election timeout and increases client retries.  
It is basically an extension of 
[HDDS-1780](https://issues.apache.org/jira/browse/HDDS-1780).
   
   Additional changes:
   
    * move `testMultiBlockWritesWithIntermittentDnFailures` to 
`TestMultiBlockWritesWithDnFailures`
    * remove unused `maxRetries` member
    * call cluster `shutdown()` regardless of test success/failure (see also 
[HDDS-1949](https://issues.apache.org/jira/browse/HDDS-1949))
   
   https://issues.apache.org/jira/browse/HDDS-1908
   
   ## How was this patch tested?
   
   Ran both test classes 10+ times, without any intermittent failure.
   
   ```
   [INFO] Running org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
   [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
157.086 s - in org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
   [INFO] Running 
org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
   [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
75.308 s - in 
org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 293280)
            Time Spent: 10m
    Remaining Estimate: 0h

> TestMultiBlockWritesWithDnFailures is failing
> ---------------------------------------------
>
>                 Key: HDDS-1908
>                 URL: https://issues.apache.org/jira/browse/HDDS-1908
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: test
>            Reporter: Nanda kumar
>            Assignee: Doroszlai, Attila
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestMultiBlockWritesWithDnFailures is failing with the following exception
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 30.992 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
> [ERROR] 
> testMultiBlockWritesWithDnFailures(org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures)
>   Time elapsed: 30.941 s  <<< ERROR!
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 
> blocks. Requested 1 blocks
>       at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:720)
>       at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:752)
>       at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:248)
>       at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:296)
>       at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:201)
>       at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:376)
>       at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:325)
>       at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:231)
>       at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
>       at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>       at java.io.OutputStream.write(OutputStream.java:75)
>       at 
> org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures.testMultiBlockWritesWithDnFailures(TestMultiBlockWritesWithDnFailures.java:144)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to