[
https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308980#comment-15308980
]
Colin Patrick McCabe edited comment on HDFS-9466 at 6/1/16 12:52 AM:
---------------------------------------------------------------------
Thanks for the explanation. It sounds like the race condition is that the
ShortCircuitRegistry on the DN needs to be informed about the client's decision
that short-circuit is not working for the block, and this RPC takes time to
arrive. That background process races with completing the TCP read
successfully and checking the number of slots in the unit test.
{code}
public static interface Visitor {
- void accept(HashMap<ShmId, RegisteredShm> segments,
+ boolean accept(HashMap<ShmId, RegisteredShm> segments,
HashMultimap<ExtendedBlockId, Slot> slots);
}
{code}
I don't think it makes sense to change the return type of the visitor. While
you might find a boolean convenient, some other potential users of the
interface might not find it useful. Instead, just have your closure modify a
{{final MutableBoolean}} declared nearby.
{code}
+ }, 100, 10000);
{code}
It seems like we could lower the latency here (perhaps check every 10 ms) and
lengthen the timeout. Since the test timeouts are generally 60s, I don't think
it makes sense to make this timeout shorter than that.
+1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in
reviews.
was (Author: cmccabe):
Thanks for the explanation. It sounds like the race condition is that the
ShortCircuitRegistry on the DN needs to be informed about the client's decision
that short-circuit is not working for the block, and this RPC takes time to
arrive. That background process races with completing the TCP read
successfully and checking the number of slots in the unit test.
{code}
public static interface Visitor {
- void accept(HashMap<ShmId, RegisteredShm> segments,
+ boolean accept(HashMap<ShmId, RegisteredShm> segments,
HashMultimap<ExtendedBlockId, Slot> slots);
}
{code}
I don't think it makes sense to change the return type of the visitor. While
you might find a boolean convenient, some other potential users of the
interface would have no use for it. Instead, just have your closure modify a
{{final MutableBoolean}} declared nearby.
{code}
+ }, 100, 10000);
{code}
No reason to make this shorter than the test limit, surely?
+1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in
reviews.
> TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
> --------------------------------------------------------------------
>
> Key: HDFS-9466
> URL: https://issues.apache.org/jira/browse/HDFS-9466
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: fs, hdfs-client
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch,
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt
>
>
> This test is flaky and fails quite frequently in trunk.
> Error Message
> expected:<1> but was:<2>
> Stacktrace
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<2>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
> at
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
> at
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
> at
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
> {noformat}
> Thanks to [~xiaochen] for identifying the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]