Sergey,

Thanks for such a detailed description!

I find the first option more attractive. I see the maintenance mode as a
special state of a node that a user can turn on and off. If you want to
perform defragmentation, you need to turn that mode on. If you try to do it
in a normal mode, you get an error and a suggestion to turn MM on. Certain
commands will have a dependency on this mode.
It's like "active" / "inactive" / "read-only" cluster states, but for
nodes. You need to have an active cluster to perform cache puts. Similarly
you'll need to have a node in a maintenance mode to perform PDS recovery.

The approach with a "maintenance" command introduces the limitation that
the control utility will have to know about every command that requires
maintenance. There is a chance that this command will become bloated with
options. It will also be problematic for plugins to introduce new commands
requiring the maintenance mode.

Denis

вт, 29 сент. 2020 г. в 18:03, Sergey Chugunov <[email protected]>:

> Hello Ignite dev community,
>
> As internal implementation of Maintenance Mode [1] is getting closer to
> finish I want to discuss one more thing: user-facing API (I will use
> control utility for examples) for managing it.
>
> What should be managed?
> When a node enters MM, it may start some automatic actions (like
> defragmentation) or wait for a user to intervene and resolve the issue
> (like in case of pds corruption).
>
> So for manually triggered operations like pds cleanup after corruption we
> should provide the user with a way to actually trigger the operation.
> And for long-running automatic operations like defragmentation actions like
> status and cancel are reasonable to implement.
>
> At the same time Maintenance Mode is a supporting feature; it doesn't bring
> any value by itself but enables implementation of other features.
> Thus putting it at the center of API and build all commands around the main
> "maintenance" command may not be right.
>
> There are two alternatives - "*Big features deserve their own commands*"
> and "*Everything should be unified*". Consider them.
>
> Big features deserve their own commands
> Here for each big feature we implement its own command. Defragmentation is
> a big separate feature so why shouldn't it have its own commands to request
> or cancel it?
>
> Examples
>     *control.sh defragmentation request-for-node --nodeId <node-id>
> [--caches <caches list>]* - defragmentation will be started on the
> particular node after its restart.
>     *control.sh defragmentation status* - prints information about status
> of on-going defragmentation.
>     *control.sh defragmentation cancel* - cancels on-going defragmentation.
>
> Another command - "maintenance" - will be used for more generic purposes.
>
> Examples
>     *control.sh maintenance list-records* - prints information about each
> maintenance record (id and name of the record, parameters, description,
> current status).
>     *control.sh maintenance record-actions --id <record-id>* - prints
> information about user-triggered actions available for this record (e.g.
> for pds corruption record it may be "clean-corrupted-files")
>     *control.sh maintenance execute-action --id <record-id> --action-name
> <action name>* - triggers execution of particular action and prints
> results.
>
> *Pros:*
>
>    1. Big features like defragmentation get their own commands and more
>    freedom in implementing them.
>    2. It is emphasized that maintenance mode is just a supporting thing and
>    not a first-class feature (it is not at the center of API).
>
> *Cons:*
>
>    1. Duplication of functionality. The same functions may be available via
>    general maintenance command and a separate command of the feature.
>    2. Information about a feature may be split into two commands. One piece
>    of information is available in the "feature" command, another in the
>    "maintenance" command.
>
>
> Everything should be unified
> We can go another way and gather all features that rely on MM under one
> unified command.
>
> API for node that is already in MM looks complete and logical, very
> intuitive:
>     *control.sh maintenance list-records* - output all records that have to
> be resolved to finish maintenance.
>     *control.sh maintenance record-actions --id <record-id>* - all actions
> available for the record.
>     *control.sh maintenance execute-action --id <record-id> --action-name
> <action-name>* - executes action of the given name (like general actions
> "status" or "delete" and more specific action "clean-corrupted-files" for
> corrupted pds situation).
>
> But API to request node to enter maintenance mode becomes more vague.
>     *control.sh maintenance available-operations* - prints all operations
> available to request (for instance, defragmentation).
>     control.sh maintenance request-operation --id <operation-id> --params
> <operation parameters> - requests given operation to start on next node
> restart.
> Here we have to distinguish operations that are requested automatically
> (like pds corruption) and not show them to the user.
>
> *Pros:*
>
>    1. Single API to get information and trigger actions without any
>    duplication.
>
>
> *Cons:*
>
>    1. We restrict big features by model provided by maintenance command.
>    2. In this API we put maintenance in the center although it is nothing
>    more than a supporting feature.
>    3. API to request maintenance operations doesn't feel intuitive to me
>    but more artificial.
>
>
> So what do you think? What looks better and more intuitive from your
> perspective?
>
> I will be glad to hear any feedback on the subject.
>
> As a result of this discussion I will create a ticket for implementation
> and include it into IEP-53 [2]
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13366
> [2]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-53%3A+Maintenance+Mode
>

Reply via email to