pan3793 opened a new pull request, #8472:
URL: https://github.com/apache/hadoop/pull/8472
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
1. `BindException` was not caught — main cause of the failure shown in the
log.
Netty's `ChannelFuture#sync()` uses `PlatformDependent.throwException` to
"sneaky-throw" the original cause. When the OS returns "Address already in
use", the cause is `java.net.BindException` (which extends `IOException`), not
`ChannelException`. The original `catch (InterruptedException |
ChannelException e)` missed it entirely, so the exception propagated out of the
loop and failed the test on the first attempt - exactly matching the stack
trace in the failure log.
```
Error: Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
1.224 s <<< FAILURE! -- in org.apache.hadoop.oncrpc.TestFrameDecoder
Error: org.apache.hadoop.oncrpc.TestFrameDecoder.testFrames -- Time
elapsed: 0.013 s <<< ERROR!
java.net.BindException: Address already in use
at java.base/sun.nio.ch.Net.bind0(Native Method)
at java.base/sun.nio.ch.Net.bind(Net.java:567)
at
java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:337)
at
java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:294)
at
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:561)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1281)
at
io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:600)
at
io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:579)
at
io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:922)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:259)
at
io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:384)
at
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:840)
Suppressed: java.lang.RuntimeException: Rethrowing promise failure cause
at
io.netty.util.concurrent.DefaultPromise.rethrowIfFailed(DefaultPromise.java:686)
at
io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:420)
at
io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:119)
at
io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:30)
at
org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:88)
at
org.apache.hadoop.oncrpc.TestFrameDecoder.startRpcServer(TestFrameDecoder.java:237)
at
org.apache.hadoop.oncrpc.TestFrameDecoder.testFrames(TestFrameDecoder.java:177)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
```
2. The port increment could be zero.
`serverPort += rand.nextInt(20)` returns `[0, 20)`, so on retry the same
busy port could be picked again. Changed to `1 + rand.nextInt(20)` so the port
is always bumped.
3. `InterruptedException` was lumped with "port in use".
An external thread interrupt should not trigger a port-bump retry. Split
into its own handler that restores the interrupt flag and propagates.
Contains content generated by: Claude Opus 4.7
### How was this patch tested?
Run dozens of rounds
```
./mvnw test -pl hadoop-common-project/hadoop-common -am
-Dtest=TestFrameDecoder
...
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.oncrpc.TestFrameDecoder
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader
classes because bootstrap classpath has been appended
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.036
s -- in org.apache.hadoop.oncrpc.TestFrameDecoder
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
```
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(HADOOP-19881)?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
If an AI tool was used:
- [x] The PR includes the phrase "Contains content generated by <tool>"
where <tool> is the name of the AI tool used.
- [x] My use of AI contributions follows the ASF legal policy
https://www.apache.org/legal/generative-tooling.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]