[PATCH master 10/11 v2] harep: do not wait for repair job completion to set tags

Dato Simó Mon, 28 Jan 2013 11:21:48 -0800

Because of instance locks, after submitting a repair job we weren't able to
set the "pending" tag until at least the first opcode of the job finished.
Introduce a small delay in the repair job so as to allow the subsequent
TAGS_SET job to finish immediately, resulting in faster operation from the
user's point of view.


Make the duration of the delay configurable with --job-delay; if not zero,
avoid inserting the TestDelay opcode entirely.

Signed-off-by: Dato Simó <d...@google.com>
---
> LGTM-ish. I was wondering two things:

> - should we hardcode the 10 seconds there? what if we find that on a big
>   cluster, we need for example 30 seconds (for whatever reason)? maybe
>   we should introduce this delay as a command line option, defaulting to
>   10?
> - following from the above: should we make it so that if the given delay
>   is 0, then the extra opcode is not prepended? so that we can test live
>   both with and without it?

Both seem good suggestions to me, thanks. Please see updated patch below. (I
didn't deem it necessary to split the changes to CLI.hs to a separate commit,
but let me know if you'd prefer it differently.)

Thanks,

 src/Ganeti/HTools/CLI.hs           | 12 ++++++++++
 src/Ganeti/HTools/Program/Harep.hs | 45 +++++++++++++++++++++++++++++++++-----
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/src/Ganeti/HTools/CLI.hs b/src/Ganeti/HTools/CLI.hs
index 6e40df7..00f8324 100644
--- a/src/Ganeti/HTools/CLI.hs
+++ b/src/Ganeti/HTools/CLI.hs
@@ -55,6 +55,7 @@ module Ganeti.HTools.CLI
   , oGroup
   , oIAllocSrc
   , oInstMoves
+  , oJobDelay
   , genOLuxiSocket
   , oLuxiSocket
   , oMachineReadable
@@ -118,6 +119,7 @@ data Options = Options
   , optIAllocSrc   :: Maybe FilePath -- ^ The iallocation spec
   , optSelInst     :: [String]       -- ^ Instances to be excluded
   , optLuxi        :: Maybe FilePath -- ^ Collect data from Luxi
+  , optJobDelay    :: Double         -- ^ Delay before executing first job
   , optMachineReadable :: Bool       -- ^ Output machine-readable format
   , optMaster      :: String         -- ^ Collect data from RAPI
   , optMaxLength   :: Int            -- ^ Stop after this many steps
@@ -162,6 +164,7 @@ defaultOptions  = Options
   , optIAllocSrc   = Nothing
   , optSelInst     = []
   , optLuxi        = Nothing
+  , optJobDelay    = 10
   , optMachineReadable = False
   , optMaster      = ""
   , optMaxLength   = -1
@@ -326,6 +329,15 @@ oIAllocSrc =
    "Specify an iallocator spec as the cluster data source",
    OptComplFile)
 
+oJobDelay :: OptType
+oJobDelay =
+  (Option "" ["job-delay"]
+   (reqWithConversion (tryRead "job delay")
+    (\d opts -> Ok opts { optJobDelay = d }) "SECONDS")
+   "insert this much delay before the execution of repair jobs\
+   \ to allow the tool to continue processing instances",
+   OptComplFloat)
+
 genOLuxiSocket :: String -> OptType
 genOLuxiSocket defSocket =
   (Option "L" ["luxi"]
diff --git a/src/Ganeti/HTools/Program/Harep.hs 
b/src/Ganeti/HTools/Program/Harep.hs
index 3e0345d..1e1c8a4 100644
--- a/src/Ganeti/HTools/Program/Harep.hs
+++ b/src/Ganeti/HTools/Program/Harep.hs
@@ -61,6 +61,7 @@ options = do
   luxi <- oLuxiSocket
   return
     [ luxi
+    , oJobDelay
     ]
 
 arguments :: [ArgCompletion]
@@ -347,9 +348,12 @@ detectBroken nl inst =
                    -- DTFile, DTSharedFile, DTBlock, DTRbd, DTExt.
 
 -- | Perform the suggested repair on an instance if its policy allows it.
-doRepair :: L.Client -> InstanceData -> (AutoRepairType, [OpCode])
-         -> IO InstanceData
-doRepair client instData (rtype, opcodes) =
+doRepair :: L.Client     -- ^ The Luxi client
+         -> Maybe Double -- ^ Delay to insert before the first repair opcode
+         -> InstanceData -- ^ The instance data
+         -> (AutoRepairType, [OpCode]) -- ^ The repair job to perform
+         -> IO InstanceData -- ^ The updated instance data
+doRepair client delay instData (rtype, opcodes) =
   let inst = arInstance instData
       ipol = Instance.arPolicy inst
       iname = Instance.name inst
@@ -373,9 +377,36 @@ doRepair client instData (rtype, opcodes) =
       else do
         putStrLn ("Executing " ++ show rtype ++ " repair on " ++ iname)
 
+        -- After submitting the job, we must write an autorepair:pending tag,
+        -- that includes the repair job IDs so that they can be checked later.
+        -- One problem we run into is that the repair job immediately grabs
+        -- locks for the affected instance, and the subsequent TAGS_SET job is
+        -- blocked, introducing an unnecesary delay for the end-user. One
+        -- alternative would be not to wait for the completion of the TAGS_SET
+        -- job, contrary to what commitChange normally does; but we insist on
+        -- waiting for the tag to be set so as to abort in case of failure,
+        -- because the cluster is left in an invalid state in that case.
+        --
+        -- The proper solution (in 2.9+) would be not to use tags for storing
+        -- autorepair data, or make the TAGS_SET opcode not grab an instance's
+        -- locks (if that's deemed safe). In the meantime, we introduce an
+        -- artificial delay in the repair job (via a TestDelay opcode) so that
+        -- once we have the job ID, the TAGS_SET job can complete before the
+        -- repair job actually grabs the locks. (Please note that this is not
+        -- about synchronization, but merely about speeding up the execution of
+        -- the harep tool. If this TestDelay opcode is removed, the program is
+        -- still correct.)
+        let opcodes' = case delay of
+              Just d -> OpTestDelay { opDelayDuration = d
+                                    , opDelayOnMaster = True
+                                    , opDelayOnNodes = []
+                                    , opDelayRepeat = fromJust $ mkNonNegative 0
+                                    } : opcodes
+              Nothing -> opcodes
+
         uuid <- newUUID
         time <- getClockTime
-        jids <- submitJobs [map wrapOpCode opcodes] client
+        jids <- submitJobs [map wrapOpCode opcodes'] client
 
         case jids of
           Bad e    -> exitErr e
@@ -418,8 +449,12 @@ main opts args = do
 
   -- Third step: create repair jobs for broken instances that are in ArHealthy.
   let maybeRepair c (i, r) = maybe (return i) (repairHealthy c i) r
+      jobDelay = optJobDelay opts
+      repairDelay = if jobDelay > 0      -- Could use "mfilter (> 0)" after
+                      then Just jobDelay -- dropping support for 6.12.
+                      else Nothing
       repairHealthy c i = case arState i of
-                            ArHealthy _ -> doRepair c i
+                            ArHealthy _ -> doRepair c repairDelay i
                             _           -> const (return i)
 
   _unused_repairDone <- bracket (L.getClient master) L.closeClient $
-- 
1.8.0.2-x20-1

-- 
You received this message because you are subscribed to the Google Groups 
"ganeti-devel" group.
To unsubscribe from this group, send email to 
ganeti-devel+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[PATCH master 10/11 v2] harep: do not wait for repair job completion to set tags

Reply via email to