[
https://issues.apache.org/jira/browse/KUDU-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927110#comment-15927110
]
Todd Lipcon commented on KUDU-1934:
-----------------------------------
Yea, I don't think once-a-second retries are that bad. Sure it's silly to do
for two weeks, but on the other hand, if a server is having issues for two
weeks, you'd probably notice it for other reasons, right?
I suppose we could have it shut down after inability to connect for some number
of hours, but seems kind of arbitrary.
> tservers aggressively try to reconnect to masters
> -------------------------------------------------
>
> Key: KUDU-1934
> URL: https://issues.apache.org/jira/browse/KUDU-1934
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: 1.3.0
> Reporter: Jean-Daniel Cryans
> Labels: newbie
>
> Related to KUDU-1933, I had mismatched 1.3 snapshots between the master and
> the tservers which caused them to try to reconnect to the master infinitely.
> Since they do it as fast as they can, the logs were quickly full of:
> {noformat}
> I0307 23:55:21.228502 70832 heartbeater.cc:291] Connected to a master server
> at ve0120.halxg.cloudera.com:7051
> I0307 23:55:21.228528 70832 heartbeater.cc:359] Registering TS with master...
> I0307 23:55:21.228865 70832 heartbeater.cc:389] Master
> ve0120.halxg.cloudera.com:7051 requested a full tablet report, sending...
> W0307 23:55:21.346961 70832 heartbeater.cc:499] Failed to heartbeat to
> ve0120.halxg.cloudera.com:7051: Remote error: Failed to send heartbeat to
> master: Not authorized: invalid CSR: CSR did not contain expected username.
> (CSR: '' RPC: 'kudu')
> I0307 23:55:22.347733 70832 heartbeater.cc:291] Connected to a master server
> at ve0120.halxg.cloudera.com:7051
> I0307 23:55:22.347757 70832 heartbeater.cc:359] Registering TS with master...
> I0307 23:55:22.348042 70832 heartbeater.cc:389] Master
> ve0120.halxg.cloudera.com:7051 requested a full tablet report, sending...
> W0307 23:55:22.467021 70832 heartbeater.cc:499] Failed to heartbeat to
> ve0120.halxg.cloudera.com:7051: Remote error: Failed to send heartbeat to
> master: Not authorized: invalid CSR: CSR did not contain expected username.
> (CSR: '' RPC: 'kudu')
> {noformat}
> Sounds like we should do backoff retries.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)