I don't think its connect timeout setting issue. as by default, netty
channel connect timeout is 10 sec (
https://github.com/netty/netty/blob/3.2/src/main/java/org/jboss/netty/channel/DefaultChannelConfig.java#L38).
If you checked the log, the log statements show that the connect
operation is in same second.

2013-12-30 12:29:36,731 - INFO  -
[BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting
to bookie: /67.195.138.30:15039
2013-12-30 12:29:36,732 - ERROR - [New I/O client boss
#5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current
state CONNECTING




On Mon, Dec 30, 2013 at 9:31 PM, Rakesh R <rake...@huawei.com> wrote:

> Hi Flavio,
>
> As test case name says, it is testing multiple bookie failures.
>
> On bookiefailure, when doing the ensemble reformation, unfortunately it is
> failing to connect to the Bookie-15039. But it should suppose to get
> connected and continue write operation. This is the reason for the test
> case failure. Please see the following log pattern:
>
> 2013-12-30 12:29:36,731 - INFO  -
> [BookKeeperClientWorker-0-0:PerChannelBookieClient@167] - Connecting to
> bookie: /67.195.138.30:15039
> 2013-12-30 12:29:36,732 - ERROR - [New I/O client boss
> #5-1:PerChannelBookieClient$1@203] - Could not connect to bookie: [id:
> 0x019a639b, /229.27.250.246:46509 :> /67.195.138.30:15039], current state
> CONNECTING
> 2013-12-30 12:29:36,732 - WARN  -
> [BookKeeperClientWorker-0-0:PendingAddOp@158] - Write did not succeed: L0
> E100 on /67.195.138.30:15039
> 2013-12-30 12:29:36,733 - INFO  -
> [BookKeeperClientWorker-0-0:LedgerHandle@659] - Handling failure of
> bookie: /67.195.138.30:15039 index: 2
> 2013-12-30 12:29:36,733 - WARN  -
> [BookKeeperClientWorker-0-0:RackawareEnsemblePlacementPolicy@491] -
> Failed to choose a bookie from /default-rack : excluded [<Bookie:
> 67.195.138.30:15036>, <Bookie:67.195.138.30:15038>, <Bookie:
> 67.195.138.30:15039>, <Bookie:67.195.138.30:15040>, <Bookie:
> 67.195.138.30:15035>], fallback to choose bookie randomly from the
> cluster.
>
>
> I'm thinking, there could be chance of small network fluctuations or slow
> machine and resulting in connection failure.
> To handle this IMHO, we should have netty client connection timeout in
> place and should retry for few intervals. Let me do a try with
> bootstrap.setOption("connectTimeoutMillis", timeoutvalue);
> Shall I raise a JIRA to discuss about these concerns and will reach to a
> conclusion. Whats your opinion?
>
> -Rakesh
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpjunque...@yahoo.com]
> Sent: 31 December 2013 01:51
> To: bookkeeper-dev@zookeeper.apache.org
> Subject: Fwd: Build failed in Jenkins: bookkeeper-trunk #489
>
> I was wondering if there is a jira open for the test that failed below,
> does anyone know?
>
> -Flavio
>
> Begin forwarded message:
>
> > Tests in error:
> >
>  
> testWithMultipleBookieFailuresInLastEnsemble[2](org.apache.bookkeeper.client.BookieWriteLedgerTest)
>
>

Reply via email to