almost everything we do in zookkeeper is to make sure that we don't lose data in much worse scenarios. the probably of a loss in this scenario is really just the probability of a bug in the code. i don't think that kill -TERM vs kill -KILL changes that probability at all either way.
ben On Thu, Jul 28, 2011 at 12:50 AM, Laxman <[email protected]> wrote: > Thanks for the responses Mahadev, Pat and Ben. > I understand your explanation. > > My only question is "Will there be any probability data loss in the scenario > mentioned?" > >>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted > there is a chance of data loss. > >>>if we use sigterm in the script, we would want to put a timeout in to > escalate to a -9 > > As Ben mentioned, even if we escalate to "kill -9" to ensure shutdown, still > we may have data loss. But the probability is very less by giving a chance > to shutdown gracefully. > > Please do correct me if my understanding is wrong. > -- > Laxman > > -----Original Message----- > From: Benjamin Reed [mailto:[email protected]] > Sent: Thursday, July 28, 2011 11:40 AM > To: [email protected] > Subject: Re: FW: Does abrupt kill corrupts the datadir? > > i agree with pat. if we use sigterm in the script, we would want to > put a timeout in to escalate to a -9 which makes the script a bit more > complicated without reason since we don't have any exit hooks that we > want to run. zookeeper is designed to recover well from hard failures, > much worse than a kill -9. i don't think we want to change that. > > ben > > On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt <[email protected]> wrote: >> ZK has been built around the "fail fast" approach. In order to >> maintain high availability we want to ensure that restarting a server >> will result in it attempting to rejoin the quorum. IMO we would not >> want to change this (kill -9). >> >> Patrick >> >> On Tue, Jul 26, 2011 at 2:02 AM, Laxman <[email protected]> wrote: >>> Hi Everyone, >>> >>> Any thoughts? >>> Do we need consider changing abrupt shutdown to >>> >>> Implementations in some other hadoop eco system projects for your > reference. >>> Hadoop - kill [SIGTERM] >>> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung >>> ZooKeeper - "kill -9" [SIGKILL] >>> >>> >>> -----Original Message----- >>> From: Laxman [mailto:[email protected]] >>> Sent: Wednesday, July 13, 2011 12:36 PM >>> To: '[email protected]' >>> Subject: RE: Does abrupt kill corrupts the datadir? >>> >>> Hi Mahadev, >>> >>> Shutdown hook is just a quick thought. Another approach can be just give > a >>> kill [SIGTERM] call which can be interpreted by process. >>> >>> First look at the "kill -9" triggered the following scenario. >>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted > there >>>>is a chance of dataloss. >>> >>> How does zookeeper can deal with this scenario gracefully? >>> >>> Also, I feel we should give a chance to application to shutdown > gracefully >>> before abrupt shutdown. >>> >>> http://en.wikipedia.org/wiki/SIGKILL >>> >>> Because SIGKILL gives the process no opportunity to do cleanup operations > on >>> terminating, in most system shutdown procedures an attempt is first made > to >>> terminate processes using SIGTERM, before resorting to SIGKILL. >>> >>> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/ >>> >>> The application can determine what it wants to do once a SIGTERM is >>> received. While most applications will clean up their resources and stop, >>> some may not. An application may be configured to do something completely >>> different when a SIGTERM is received. Also, if the application is in a > bad >>> state, such as waiting for disk I/O, it may not be able to act on the > signal >>> that was sent. >>> >>> Most system administrators will usually resort to the more abrupt signal >>> when an application doesn't respond to a SIGTERM. >>> >>> -----Original Message----- >>> From: Mahadev Konar [mailto:[email protected]] >>> Sent: Wednesday, July 13, 2011 12:02 PM >>> To: [email protected] >>> Subject: Re: Does abrupt kill corrupts the datadir? >>> >>> Hi Laxman, >>> The servers takes care of all the issues with data integrity, so a kill >>> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure >>> everything works reliably is use kill -9 :). >>> >>> Thanks >>> mahadev >>> >>> On 7/12/11 11:16 PM, "Laxman" <[email protected]> wrote: >>> >>>>When we stop zookeeper through zkServer.sh stop, we are aborting the >>>>zookeeper process using "kill -9". >>>> >>>> >>>> >>>>129 stop) >>>> >>>>130 echo -n "Stopping zookeeper ... " >>>> >>>>131 if [ ! -f "$ZOOPIDFILE" ] >>>> >>>>132 then >>>> >>>>133 echo "error: could not find file $ZOOPIDFILE" >>>> >>>>134 exit 1 >>>> >>>>135 else >>>> >>>>136 $KILL -9 $(cat "$ZOOPIDFILE") >>>> >>>>137 rm "$ZOOPIDFILE" >>>> >>>>138 echo STOPPED >>>> >>>>139 exit 0 >>>> >>>>140 fi >>>> >>>>141 ;; >>>> >>>> >>>> >>>> >>>> >>>>This may corrupt the snapshot and transaction logs. Also, its not >>>>recommended to use "kill -9". >>>> >>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted > there >>>>is a chance of dataloss. >>>> >>>> >>>> >>>>How about introducing a shutdown hook which will ensure zookeeper is >>>>shutdown gracefully when we call stop? >>>> >>>> >>>> >>>>Note: This is just an observation and its not found in a test. >>>> >>>> >>>> >>>>-- >>>> >>>>Thanks, >>>> >>>>Laxman >>>> >>> >>> >>> >> > >
