[
https://issues.apache.org/jira/browse/KUDU-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-1418:
-------------------------------------
Target Version/s: 1.0.0 (was: 0.9.0)
Definitely won't have time, thanks.
> [java client] Master lookups can vanish under certain conditions
> ----------------------------------------------------------------
>
> Key: KUDU-1418
> URL: https://issues.apache.org/jira/browse/KUDU-1418
> Project: Kudu
> Issue Type: Bug
> Components: client
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Priority: Blocker
>
> While testing Kudu with our internal QA tools(1), we found that both DNS
> failure injection and elastic partitioning between clients and the master
> trigger a bug where master lookups just... vanish. Here's an example:
> {noformat}
> 2016-04-13 22:18:55,506 WARN [New I/O boss #9]
> org.kududb.client.GetMasterRegistrationReceived: Error receiving a response
> from: francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051
> org.kududb.client.ConnectionResetException: [Peer Kudu Master -
> francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051] Connection reset on
> [id: 0x9bd8ed44]
> at org.kududb.client.TabletClient.cleanup(TabletClient.java:630)
> (stack trace)
> 2016-04-13 22:18:55,507 WARN [New I/O boss #9]
> org.kududb.client.GetMasterRegistrationReceived: Unable to find the leader
> master (francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051), will retry
> 2016-04-13 22:18:55,507 DEBUG [New I/O boss #9]
> org.kududb.client.AsyncKuduClient: Going to sleep for 1017 at retry 2
> 2016-04-13 22:18:55,507 DEBUG [New I/O worker #7]
> org.kududb.client.TabletClient: [Peer Kudu Master -
> francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051] [id: 0x9bd8ed44]
> CLOSED
> (unrelated debug logs)
> 2016-04-13 22:28:44,951 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.io.IOException: Couldn't flush the head row,
> KuduRpc(method=Write, tablet=null, attempt=1, DeadlineTracker(timeout=0,
> elapsed=600001), null) row_key=(int64 key1=-721818921243156941, int64
> key2=5432210168070573172)
> at
> org.kududb.mapreduce.tools.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:516)
> {noformat}
> The client tries to reach the master, fails, says it's gonna retry in a
> second... then nothing until ITBLL times out 10 minutes later.
> 1.
> https://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)