Sean Broeder created TRAFODION-2236:
---------------------------------------
Summary: TM crashesh following sqstart
Key: TRAFODION-2236
URL: https://issues.apache.org/jira/browse/TRAFODION-2236
Project: Apache Trafodion
Issue Type: Bug
Components: dtm
Affects Versions: 2.0-incubating
Reporter: Sean Broeder
Assignee: Sean Broeder
Fix For: 2.1-incubating
When Trafodion is stopped abruptly when a region server has current recovery
requests posted in Zookeeper, the new TMs may be unable to start. This happens
because the TM recovery thread reads the ZK entries and attempts to send the
recovery resolution to the region server that posted the entry. It gets a
connection error because that region server no longer exists.
The partial solution is to remove the ZK entries as part of startup so the TM
can startup without error.
THis is safe to do because any region server needing recovery will repost to
zookeeper and the TM will have no issues connecting to this RS.
An additional fix will be made to the TM to handle exceptions in trying to
communicate with region servers during recovery.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)