[ 
https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308980#comment-15308980
 ] 

Colin Patrick McCabe edited comment on HDFS-9466 at 6/1/16 12:52 AM:
---------------------------------------------------------------------

Thanks for the explanation.  It sounds like the race condition is that the 
ShortCircuitRegistry on the DN needs to be informed about the client's decision 
that short-circuit is not working for the block, and this RPC takes time to 
arrive.  That background process races with completing the TCP read 
successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-    void accept(HashMap<ShmId, RegisteredShm> segments,
+    boolean accept(HashMap<ShmId, RegisteredShm> segments,
                 HashMultimap<ExtendedBlockId, Slot> slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While 
you might find a boolean convenient, some other potential users of the 
interface might not find it useful.  Instead, just have your closure modify a 
{{final MutableBoolean}} declared nearby.

{code}
+    }, 100, 10000);
{code}
It seems like we could lower the latency here (perhaps check every 10 ms) and 
lengthen the timeout.  Since the test timeouts are generally 60s, I don't think 
it makes sense to make this timeout shorter than that.

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in 
reviews.


was (Author: cmccabe):
Thanks for the explanation.  It sounds like the race condition is that the 
ShortCircuitRegistry on the DN needs to be informed about the client's decision 
that short-circuit is not working for the block, and this RPC takes time to 
arrive.  That background process races with completing the TCP read 
successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-    void accept(HashMap<ShmId, RegisteredShm> segments,
+    boolean accept(HashMap<ShmId, RegisteredShm> segments,
                 HashMultimap<ExtendedBlockId, Slot> slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While 
you might find a boolean convenient, some other potential users of the 
interface would have no use for it.  Instead, just have your closure modify a 
{{final MutableBoolean}} declared nearby.

{code}
+    }, 100, 10000);
{code}
No reason to make this shorter than the test limit, surely?

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in 
reviews.

> TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
> --------------------------------------------------------------------
>
>                 Key: HDFS-9466
>                 URL: https://issues.apache.org/jira/browse/HDFS-9466
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fs, hdfs-client
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt
>
>
> This test is flaky and fails quite frequently in trunk.
> Error Message
> expected:<1> but was:<2>
> Stacktrace
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<2>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
>       at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
>       at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
>       at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
> {noformat}
> Thanks to [~xiaochen] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to