[
https://issues.apache.org/jira/browse/HDDS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875064#comment-16875064
]
Elek, Marton commented on HDDS-1384:
------------------------------------
This is still a serious problem and I can see related (flaky) failures at every
second day.
I uploaded my second attempt to fix this (in a simplified way as earlier): I
fixed the race condition in a way, which is similar how the port handling is
used for hadoop rpc: in case of port=0, the port (which is reported to scm)
should be updated based on the real socket address.
> TestBlockOutputStreamWithFailures is failing
> --------------------------------------------
>
> Key: HDDS-1384
> URL: https://issues.apache.org/jira/browse/HDDS-1384
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: test
> Reporter: Nanda kumar
> Assignee: Elek, Marton
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.4.1
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> TestBlockOutputStreamWithFailures is failing with the following error
> {noformat}
> 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker
> (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for
> org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker
> (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for
> volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
> 2019-04-04 18:52:43,241 ERROR server.GrpcService
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to
> start Grpc server
> java.io.IOException: Failed to bind
> at
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
> at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
> at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
> at
> org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
> at
> org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
> at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
> at
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
> at
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
> at
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
> at
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
> at
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
> at
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
> at
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
> at
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
> at
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
> at
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
> at
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
> at
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
> at
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
> at
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ... 1 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]