Forgot to include Ganeti Development.
---------- Forwarded message ----------
From: Petr Pudlák <[email protected]>
Date: Thu, Apr 17, 2014 at 5:43 PM
Subject: Re: [PATCH master 34/37] Cancel jobs by sending SIGTERM
To: Klaus Aehlig <[email protected]>
I propose the interdiff:
diff --git a/src/Ganeti/Constants.hs b/src/Ganeti/Constants.hs
index 45c8069..cf09b05 100644
--- a/src/Ganeti/Constants.hs
+++ b/src/Ganeti/Constants.hs
@@ -4670,6 +4670,11 @@ luxiWfjcTimeout = (luxiDefRwto - 1) `div` 2
luxiLivelockPrefix :: String
luxiLivelockPrefix = "luxi-daemon"
+-- | The LUXI daemon waits this number of seconds for ensuring that a
canceled
+-- job terminates before giving up.
+luxiCancelJobTimeout :: Int
+luxiCancelJobTimeout = (luxiDefRwto - 1) `div` 4
+
-- * Query language constants
-- ** Logic operators with one or more operands, each of which is a
diff --git a/src/Ganeti/JQueue.hs b/src/Ganeti/JQueue.hs
index f6f3887..b0e1321 100644
--- a/src/Ganeti/JQueue.hs
+++ b/src/Ganeti/JQueue.hs
@@ -620,7 +620,7 @@ cancelJob jid = runResultT $ do
return (True, jName ++ " has been already dead")
Just pid -> do
liftIO $ signalProcess sigTERM pid
- lift $ waitForJob jid 15
+ lift $ waitForJob jid C.luxiCancelJobTimeout
_ -> do
logDebug $ jName ++ " in its startup phase, retrying"
mzero
On Thu, Apr 17, 2014 at 5:23 PM, Klaus Aehlig <[email protected]> wrote:
> On Thu, Apr 17, 2014 at 03:11:50PM +0200, Petr Pudlak wrote:
> > We can only send the signal if the job is alive and if there is a
> > process ID in the job file (which means that the signal handler has been
> > installed). If it's missing, we need to wait and retry.
> >
> > In addition, after we send the signal, we wait for the job to actually
> > die, to retain the original semantics.
> >
> > Signed-off-by: Petr Pudlak <[email protected]>
> > ---
> > src/Ganeti/JQueue.hs | 65
> ++++++++++++++++++++++++++++++++++++----------
> > src/Ganeti/Query/Server.hs | 2 +-
> > 2 files changed, 52 insertions(+), 15 deletions(-)
>
> > + let jName = ("Job " ++) . show . fromJobId . qjId $ job
> > + dead <- maybe (return False) (liftIO . isDead) (qjLivelock job)
> > + case qjProcessId job of
> > + _ | dead ->
> > + return (True, jName ++ " has been already dead")
> > + Just pid -> do
> > + liftIO $ signalProcess sigTERM pid
> > + lift $ waitForJob jid 15
>
> Instead of hardcoding 15s here, can you please
> - make this a constant in src/Ganeti/Constants.hs, and
> - chose the constant to depend on luxiDefRwto, e.g., chosing
> (luxiDefRwto - 1) `div` 4
>
> The rest looks good, thanks.
>
> --
> Klaus Aehlig
> Google Germany GmbH, Dienerstr. 12, 80331 Muenchen
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschaeftsfuehrer: Graham Law, Christine Elizabeth Flores
>