steveloughran commented on PR #6537:
URL: https://github.com/apache/hadoop/pull/6537#issuecomment-1956544980

   This is not good.
   
   But looking at the failures I don't know whether to categorise as "test 
runner regression" or "brittle tests failing under new test runner".
   
   Here are some of the ones I've looked at
   
   
   `TestDirectoryScanner.testThrottling`
   This test is measuring how long things took. it is way too brittle against 
timing changes, both slower and faster.
   ```
   java.lang.AssertionError: Throttle is too permissive
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at 
org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:901)
   ```
   
   I think the step here is to move to assertj so asserts fail with meaningful 
messages, see if the failure can be understood. Ideally you'd want a test which 
doesn't measure elapsed time, but instead uses counters in the code (here: of 
throttle events) to assert what took place.
   
   Test` TestBlockListAsLongs.testFuzz`
   
   See this painfully often else where -it means that the protobuf lib was 
built with a more recent version of java8 than the early oracle ones. Its 
fixable in your own build (use the older one) or cast ByteBuffer to Buffer. 
otherwise we need to make sure tests are on a more recent build.
   
   ```
   java.lang.NoSuchMethodError: 
java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
        at 
org.apache.hadoop.thirdparty.protobuf.IterableByteBufferInputStream.read(IterableByteBufferInputStream.java:143)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.read(CodedInputStream.java:2080)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.tryRefillBuffer(CodedInputStream.java:2831)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.refillBuffer(CodedInputStream.java:2777)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readRawByte(CodedInputStream.java:2859)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readRawVarint64SlowPath(CodedInputStream.java:2648)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readRawVarint64(CodedInputStream.java:2641)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readSInt64(CodedInputStream.java:2497)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:419)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:397)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder.getBlockListAsLongs(BlockListAsLongs.java:375)
        at 
org.apache.hadoop.hdfs.protocol.TestBlockListAsLongs.checkReport(TestBlockListAsLongs.java:156)
        at 
org.apache.hadoop.hdfs.protocol.TestBlockListAsLongs.testFuzz(TestBlockListAsLongs.java:139)
   ```
   
   test `TestDFSAdmin.testDecommissionDataNodesReconfig`
   ```
   java.lang.AssertionError
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at 
org.apache.hadoop.hdfs.tools.TestDFSAdmin.testDecommissionDataNodesReconfig(TestDFSAdmin.java:1356)
   ```
   not a very meaningful message. suspect that a different ordering of the 
threads is causing the assert to fail.
   1. move to AssertJ
   2. analyse error, see what the fix is.
   
   Test `TestCacheDirectives`. 
   
   ```
   at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:403)
        at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:362)
        at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.waitForCachedBlocks(TestCacheDirectives.java:760)
        at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.teardown(TestCacheDirectives.java:173)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   ```
   
   this is a timeout during teardown; after this subsequent tests are possibly 
going to fail. No obvious cause, though again I'd suspect race conditions.
   
   Rather than say "hey, let's revert", I'd propose a "surefire update triggers 
test failures" and see what can be done about addressing them. because we can't 
stay frozen with surefire versions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to