[
https://issues.apache.org/jira/browse/KUDU-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated KUDU-1418:
------------------------------
Priority: Critical (was: Blocker)
I think it makes sense to downgrade this to Critical, since we haven't seen
this frequently enough "in the wild". Would still be very good to fix, but
shouldn't block the release.
> [java client] Master lookups can vanish under certain conditions
> ----------------------------------------------------------------
>
> Key: KUDU-1418
> URL: https://issues.apache.org/jira/browse/KUDU-1418
> Project: Kudu
> Issue Type: Bug
> Components: client
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Priority: Critical
>
> While testing Kudu with our internal QA tools(1), we found that both DNS
> failure injection and elastic partitioning between clients and the master
> trigger a bug where master lookups just... vanish. Here's an example:
> {noformat}
> 2016-04-13 22:18:55,506 WARN [New I/O boss #9]
> org.kududb.client.GetMasterRegistrationReceived: Error receiving a response
> from: francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051
> org.kududb.client.ConnectionResetException: [Peer Kudu Master -
> francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051] Connection reset on
> [id: 0x9bd8ed44]
> at org.kududb.client.TabletClient.cleanup(TabletClient.java:630)
> (stack trace)
> 2016-04-13 22:18:55,507 WARN [New I/O boss #9]
> org.kududb.client.GetMasterRegistrationReceived: Unable to find the leader
> master (francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051), will retry
> 2016-04-13 22:18:55,507 DEBUG [New I/O boss #9]
> org.kududb.client.AsyncKuduClient: Going to sleep for 1017 at retry 2
> 2016-04-13 22:18:55,507 DEBUG [New I/O worker #7]
> org.kududb.client.TabletClient: [Peer Kudu Master -
> francesco-ec2-kudu-centos66-11-1.vpc.cloudera.com:7051] [id: 0x9bd8ed44]
> CLOSED
> (unrelated debug logs)
> 2016-04-13 22:28:44,951 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.io.IOException: Couldn't flush the head row,
> KuduRpc(method=Write, tablet=null, attempt=1, DeadlineTracker(timeout=0,
> elapsed=600001), null) row_key=(int64 key1=-721818921243156941, int64
> key2=5432210168070573172)
> at
> org.kududb.mapreduce.tools.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:516)
> {noformat}
> The client tries to reach the master, fails, says it's gonna retry in a
> second... then nothing until ITBLL times out 10 minutes later.
> 1.
> https://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)