[
https://issues.apache.org/jira/browse/KUDU-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adar Dembo reassigned KUDU-2080:
--------------------------------
Assignee: Jiahongchao
> Masters stuck in a bad state when not starting them together on initial
> deployment
> ----------------------------------------------------------------------------------
>
> Key: KUDU-2080
> URL: https://issues.apache.org/jira/browse/KUDU-2080
> Project: Kudu
> Issue Type: Bug
> Components: master
> Affects Versions: 1.3.0
> Reporter: Attila Bukor
> Assignee: Jiahongchao
> Priority: Major
> Labels: usability
>
> When masters are started separately on the first run when they're trying to
> write the consensus data they won't be able to connect to each other and fail
> writing.
> {code}
> I0726 14:15:22.894768 55240 consensus_peers.cc:503] Retrying to get permanent
> uuid for remote peer: member_type: VOTER last_known_addr { host:
> "master1.example.com" port: 7051 } attempt: 10
> W0726 14:15:22.895084 55240 consensus_peers.cc:493] Error getting permanent
> uuid from config peer master1.example.com:7051: Network error: Client
> connection negotiation failed: client connection to 10.1.0.1:7051: connect:
> Connection refused (error 111)
> I0726 14:15:36.235213 55240 consensus_peers.cc:503] Retrying to get permanent
> uuid for remote peer: member_type: VOTER last_known_addr { host:
> "master1.example.com" port: 7051 } attempt: 11
> W0726 14:15:36.235498 55240 consensus_peers.cc:493] Error getting permanent
> uuid from config peer master1.example.com:7051: Network error: Client
> connection negotiation failed: client connection to 10.1.0.1:7051: connect:
> Connection refused (error 111)
> E0726 14:15:36.235572 55240 master.cc:171] [email protected]:7051: Unable to
> init master catalog manager: Timed out: Unable to initialize catalog manager:
> Failed to initialize sys tables async: Failed to create new distributed Raft
> config: Unable to resolve UUID for peer member_type: VOTER last_known_addr {
> host: "master1.example.com" port: 7051 }: Getting permanent uuid from
> master1.example.com:7051 timed out after 30000 ms.: Network error: Client
> connection negotiation failed: client connection to 10.1.0.1:7051: connect:
> Connection refused (error 111)
> F0726 14:15:36.235663 55079 master_main.cc:71] Check failed: _s.ok() Bad
> status: Timed out: Unable to initialize catalog manager: Failed to initialize
> sys tables async: Failed to create new distributed Raft config: Unable to
> resolve UUID for peer member_type: VOTER last_known_addr { host:
> "master1.example.com" port: 7051 }: Getting permanent uuid from
> master1.example.com:7051 timed out after 30000 ms.: Network error: Client
> connection negotiation failed: client connection to 10.1.0.1:7051: connect:
> Connection refused (error 111)
> {code}
> After this the tablet-meta will be there but the consensus-meta will be
> missing and the startup will fail until all masters' data directory is empty
> and they're started again at the same time (similarly to KUDU-1186):
> {code}
> I0726 14:20:52.455219 58429 sys_catalog.cc:128] Verifying existing consensus
> state
> E0726 14:20:52.455294 58429 master.cc:171] [email protected]:7051: Unable to
> init master catalog manager: Not found: Unable to initialize catalog manager:
> Failed to initialize sys tables async: Unable to load consensus metadata for
> tablet 00000000000000000000000000000000:
> /data/kudu/master/data/consensus-meta/00000000000000000000000000000000: No
> such file or directory (error 2)
> F0726 14:20:52.455400 58268 master_main.cc:71] Check failed: _s.ok() Bad
> status: Not found: Unable to initialize catalog manager: Failed to initialize
> sys tables async: Unable to load consensus metadata for tablet
> 00000000000000000000000000000000:
> /data/kudu/master/data/consensus-meta/00000000000000000000000000000000: No
> such file or directory (error 2)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)