Sergey, Thanks for such a detailed description!
I find the first option more attractive. I see the maintenance mode as a special state of a node that a user can turn on and off. If you want to perform defragmentation, you need to turn that mode on. If you try to do it in a normal mode, you get an error and a suggestion to turn MM on. Certain commands will have a dependency on this mode. It's like "active" / "inactive" / "read-only" cluster states, but for nodes. You need to have an active cluster to perform cache puts. Similarly you'll need to have a node in a maintenance mode to perform PDS recovery. The approach with a "maintenance" command introduces the limitation that the control utility will have to know about every command that requires maintenance. There is a chance that this command will become bloated with options. It will also be problematic for plugins to introduce new commands requiring the maintenance mode. Denis вт, 29 сент. 2020 г. в 18:03, Sergey Chugunov <[email protected]>: > Hello Ignite dev community, > > As internal implementation of Maintenance Mode [1] is getting closer to > finish I want to discuss one more thing: user-facing API (I will use > control utility for examples) for managing it. > > What should be managed? > When a node enters MM, it may start some automatic actions (like > defragmentation) or wait for a user to intervene and resolve the issue > (like in case of pds corruption). > > So for manually triggered operations like pds cleanup after corruption we > should provide the user with a way to actually trigger the operation. > And for long-running automatic operations like defragmentation actions like > status and cancel are reasonable to implement. > > At the same time Maintenance Mode is a supporting feature; it doesn't bring > any value by itself but enables implementation of other features. > Thus putting it at the center of API and build all commands around the main > "maintenance" command may not be right. > > There are two alternatives - "*Big features deserve their own commands*" > and "*Everything should be unified*". Consider them. > > Big features deserve their own commands > Here for each big feature we implement its own command. Defragmentation is > a big separate feature so why shouldn't it have its own commands to request > or cancel it? > > Examples > *control.sh defragmentation request-for-node --nodeId <node-id> > [--caches <caches list>]* - defragmentation will be started on the > particular node after its restart. > *control.sh defragmentation status* - prints information about status > of on-going defragmentation. > *control.sh defragmentation cancel* - cancels on-going defragmentation. > > Another command - "maintenance" - will be used for more generic purposes. > > Examples > *control.sh maintenance list-records* - prints information about each > maintenance record (id and name of the record, parameters, description, > current status). > *control.sh maintenance record-actions --id <record-id>* - prints > information about user-triggered actions available for this record (e.g. > for pds corruption record it may be "clean-corrupted-files") > *control.sh maintenance execute-action --id <record-id> --action-name > <action name>* - triggers execution of particular action and prints > results. > > *Pros:* > > 1. Big features like defragmentation get their own commands and more > freedom in implementing them. > 2. It is emphasized that maintenance mode is just a supporting thing and > not a first-class feature (it is not at the center of API). > > *Cons:* > > 1. Duplication of functionality. The same functions may be available via > general maintenance command and a separate command of the feature. > 2. Information about a feature may be split into two commands. One piece > of information is available in the "feature" command, another in the > "maintenance" command. > > > Everything should be unified > We can go another way and gather all features that rely on MM under one > unified command. > > API for node that is already in MM looks complete and logical, very > intuitive: > *control.sh maintenance list-records* - output all records that have to > be resolved to finish maintenance. > *control.sh maintenance record-actions --id <record-id>* - all actions > available for the record. > *control.sh maintenance execute-action --id <record-id> --action-name > <action-name>* - executes action of the given name (like general actions > "status" or "delete" and more specific action "clean-corrupted-files" for > corrupted pds situation). > > But API to request node to enter maintenance mode becomes more vague. > *control.sh maintenance available-operations* - prints all operations > available to request (for instance, defragmentation). > control.sh maintenance request-operation --id <operation-id> --params > <operation parameters> - requests given operation to start on next node > restart. > Here we have to distinguish operations that are requested automatically > (like pds corruption) and not show them to the user. > > *Pros:* > > 1. Single API to get information and trigger actions without any > duplication. > > > *Cons:* > > 1. We restrict big features by model provided by maintenance command. > 2. In this API we put maintenance in the center although it is nothing > more than a supporting feature. > 3. API to request maintenance operations doesn't feel intuitive to me > but more artificial. > > > So what do you think? What looks better and more intuitive from your > perspective? > > I will be glad to hear any feedback on the subject. > > As a result of this discussion I will create a ticket for implementation > and include it into IEP-53 [2] > > [1] https://issues.apache.org/jira/browse/IGNITE-13366 > [2] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-53%3A+Maintenance+Mode >
