Hello,
please find below my next patch series for the auto-repair work. It's
basically defining the data types, and an additional patch to parse the
initial auto-repair policy in Loader.hs.
Comments:
(1) regarding the data types in 3/4, I initially hesitated in
including AutoRepair{Status,Data}, since I wasn't completely sure
that they wouldn't require more modifications down the line.
However, after stopping to think some more about them, they seem
to me a good starting point, and I think only AutoRepairData might
need some extra fields later on.
To think about these states, it helped me to produce a state
diagram (see below). You'll see, too, that I've decoupled the
concept of "policy" and the concept of "state"; I think this makes
more sense.
The state diagram is here:
https://docs.google.com/a/google.com/drawings/d/1iE4R0xx9V7jqJScmqsL_RafrUgb-8y4yE0nBHCS1JJY/view
(Regrettably, that's Google-internal, so I'm publishing a
read-only, public copy here:
https://lh6.googleusercontent.com/-o5SKKFIUTUs/UMjtaDT52UI/AAAAAAAAAQM/8dhrJgOo0fw/s960/Ganeti%2520auto-repair%2520states%2520%25282%2529.png)
Colors blue/orange/red refer to the three phases of the tool:
initial state, assessing of current state, action performed.
Please let me know what you think.
(2) There is one thing in this CL that may need further thought: the
'arPolicy' field introduced in 4/8, because it's not clear how
it's going to interact with expired `ArSuspended (Until ...)`
policies. In particular, if mergeData loads an ArSuspended policy
that, later on, in the IO monad, the tool discovers to be expired,
there is no longer access to the policy that should have been
applied (because it's been discarded in Loader.hs).
I thought of three possible approaches to this problem:
(a) after the initial loadExternalData, if there are any expired
ArSuspended policies in the result, remove the offending
tags from the cluster and call loadExternalData again
(b) remove any offending tags from the cluster _before_ calling
loadExternalData (this approach could use a small
refactoring in ExtLoader.hs to make "input_data <- ..."
visible externally); but because of race conditions, this is
not essentially different from (a)
(c) make the 'arPolicy' field a list instead of a single value;
if the element at index 0 is an expired ArSuspended (one
would need to guarantee that the last element of the list is
always one of {ArEnabled ..., ArNotEnabled, ArSuspended Forever}).
(3) the only way I can think to sensibly test 8/8 is to use an input
file for the Text backend, but then parse from Haskell the result;
however, it may just make more sense to delay testing this until
more auto-repair code exists, and the whole workflow is tested.
Would that be okay?
Thanks!
Dato Simó (8):
design-autorepair.rst: use the same prefix everywhere
Add initial constants and Haskell ADTs for auto repair.
HTools/Types.hs: more auto-repair types
Instance.hs: add an 'arPolicy' field for auto-repair policy
Utils.hs: add a clockTimeToString function
Utils.hs: function to chomp prefix + separator from a string
Loader.hs: rewrite extractExTags to use chompPrefix
Loader.hs: set instance auto-repair policy in mergeData
doc/design-autorepair.rst | 8 +++---
htest/Test/Ganeti/HTools/Types.hs | 23 ++++++++++++++++
htest/Test/Ganeti/Utils.hs | 37 +++++++++++++++++++++++++
htools/Ganeti/HTools/Instance.hs | 2 ++
htools/Ganeti/HTools/Loader.hs | 57 ++++++++++++++++++++++++++++++++++-----
htools/Ganeti/HTools/Types.hs | 53 ++++++++++++++++++++++++++++++++++++
htools/Ganeti/Utils.hs | 32 ++++++++++++++++++++++
lib/constants.py | 29 ++++++++++++++++++++
8 files changed, 230 insertions(+), 11 deletions(-)
--
1.7.12.3-x20-1