I don't think that we should implement this feature even in api, because in this case user will be able to interrupt patching via cli, I think it's really risky to provide such feature especially if we know that user can loose his production nodes.
My suggestion is to remove the ticket [1] from 5.1 or set it as won't fix. [1] https://bugs.launchpad.net/fuel/+bug/1364907 On Tue, Sep 9, 2014 at 1:44 PM, <bdobre...@mirantis.com> wrote: > Perhaps, some ideas could be taken from [0] ([1]) > Note, that the linked full spec doc [1] status is rather a brainstorming > discussion than the spec ready for implementation. > I strongly believe we should follow the suggested concepts (finite-machine > states in Nailgun DB, running in HA mode, of cause) it in order to track > offline / interrupted statuses for nodes (including the master node) as > well. > > [0] > https://blueprints.launchpad.net/fuel/+spec/nailgun-unified-object-model > [1] https://etherpad.openstack.org/p/nailgun-unified-object-model > > Regards, > Bogdan Dobrelya. > > Sent from Windows Mail > > *From:* Mike Scherbakov <mscherba...@mirantis.com> > *Sent:* Tuesday, September 9, 2014 10:15 AM > *To:* Vladimir Kuklin <vkuk...@mirantis.com> > *Cc:* Igor Kalnitsky <ikalnit...@mirantis.com>, fuel-dev > <fuel-dev@lists.launchpad.net> > > Folks, > I was the one who initially requested this. I thought it's going to be > pretty similar to Stop Deployment. I becomes obvious, that it is not. > > I'm fine if we have it in API. Though I think what is much more important > here is an ability for the user to choose a few hosts for patching first, > in order to check how patching would work on a very small part of the > cluster. Ideally we would even move workloads to other nodes before doing > patching. We should disable scheduling of workloads for sure for these > experimental hosts. > Then user can run patching against these nodes, and see how it goes. If > all goes fine, patching can be applied to the rest of the environment. I do > not think though that we should do all, let's say 100 nodes, at once. This > sounds dangerous to me. I think we would need to come up with some less > dangerous scenario. > > Also, let's think and work on possible failures. What if Fuel Master node > goes off during patching? What is going to be affected? How we can complete > patching when Fuel Master comes back online? > > Or compute node under patching breaks for some reason (e.g. disk issues or > memory), how would it affect the patching process? How we can safely > continue patching of other nodes? > > Thanks, > > On Tue, Sep 9, 2014 at 12:08 PM, Vladimir Kuklin <vkuk...@mirantis.com> > wrote: > >> Sorry again. Look 2 messages below, please. >> 09 сент. 2014 г. 12:06 пользователь "Vladimir Kuklin" < >> vkuk...@mirantis.com> написал: >> >>> Sorry, hit reply instead of replyall. >>> 09 сент. 2014 г. 12:05 пользователь "Vladimir Kuklin" < >>> vkuk...@mirantis.com> написал: >>> >>>> +1 >>>> >>>> Also, I think, we should add stop patching at least to api in order to >>>> allow advanced users and service team to do what they want. >>>> 09 сент. 2014 г. 12:02 пользователь "Igor Kalnitsky" < >>>> ikalnit...@mirantis.com> написал: >>>> >>>> What we should to do with nodes in case of interrupt patching? I think >>>>> we need to mark them for re-deployment, since nodes' state may be >>>>> broken. >>>>> >>>>> Any opinion? >>>>> >>>>> - Igor >>>>> >>>>> On Mon, Sep 8, 2014 at 3:28 PM, Evgeniy L <e...@mirantis.com> wrote: >>>>> > Hi, >>>>> > >>>>> > We were working on implementation of experimental feature >>>>> > where user could interrupt openstack patching procedure [1]. >>>>> > >>>>> > It's not as easy to implement as we thought it would be. >>>>> > Current stop deployment mechanism [2] stops puppet, erases >>>>> > nodes and reboots them into bootstrap. It's ok for stop >>>>> > deployment, but it's not ok for patching, because user >>>>> > can loose his data. We can rewrite some logic in nailgun >>>>> > and in orchestrator to stop puppet and not to erase nodes. >>>>> > But I'm not sure if it works correctly because such use >>>>> > case wasn't tested. And I can see the problems like >>>>> > yum/apt-get locks cleaning after puppet interruption. >>>>> > >>>>> > As result I have several questions: >>>>> > 1. should we try to make it work for the current release? >>>>> > 2. if we shouldn't, will we need this feature for the future >>>>> > releases? Definitely additional design and research is >>>>> > required. >>>>> > >>>>> > [1] https://bugs.launchpad.net/fuel/+bug/1364907 >>>>> > [2] >>>>> > >>>>> https://github.com/stackforge/fuel-astute/blob/b622d9b36dbdd1e03b282b9ee5b7435ba649e711/lib/astute/server/dispatcher.rb#L163-L164 >>>>> > >>>>> > >>>>> > -- >>>>> > Mailing list: https://launchpad.net/~fuel-dev >>>>> > Post to : fuel-dev@lists.launchpad.net >>>>> > Unsubscribe : https://launchpad.net/~fuel-dev >>>>> > More help : https://help.launchpad.net/ListHelp >>>>> > >>>>> >>>>> -- >>>>> Mailing list: https://launchpad.net/~fuel-dev >>>>> Post to : fuel-dev@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~fuel-dev >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>> >> -- >> Mailing list: https://launchpad.net/~fuel-dev >> Post to : fuel-dev@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~fuel-dev >> More help : https://help.launchpad.net/ListHelp >> >> > > > -- > Mike Scherbakov > #mihgen > > > -- > Mailing list: https://launchpad.net/~fuel-dev > Post to : fuel-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~fuel-dev > More help : https://help.launchpad.net/ListHelp > >
-- Mailing list: https://launchpad.net/~fuel-dev Post to : fuel-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp