Hi Dmitrii, Thank you very much for your detailed response. I now understand that it is not in the scope of Juju to monitor instances and replace them. I had come across elastisys, but that is more for horizontal scaling and I am primarily looking for High Availability and self healing although elastic scaling would be nice too.
The approach you suggested of monitoring the instances and using libjuju to replace the broken machine with a new one and execute add-unit on them makes sense. Thanks, --Raghu From: Dmitrii Shcherbakov <[email protected]> Date: Saturday, September 2, 2017 at 12:50 PM To: Raghurama Bhat <[email protected]> Cc: "[email protected]" <[email protected]> Subject: Re: Newbie Question: How do I replace a machine in a deployed Model? Hi Raghurama, > Does Juju controller monitor the cluster and request MaaS for a new machine > if it detects one of the machines is gone? No, it doesn't. > Even if this has to be done manually, I did not see a replace-machine option > to Juju. There's no such functionality - either an operator needs to make a decision to do it or you need an automated system to do that depending on some custom logic. > Only add and remove units and machines. How does this work? Juju itself does not know anything about applications you deploy - any application-specific knowledge must be present in charms. What you are looking for is an orchestrator type of capability - it will be a layer on top of Juju or a charm with 'super cow powers' (namely, with admin access to a juju controller). A proof of concept would be a charmscaler from elastisys - it talks to a juju controller directly and scales based upon CPU usage from nodes collected via telegraf: https://github.com/elastisys/layer-charmscaler-base/blob/164d163b4104cc47dcb1a32019509ba3f61d91eb/config.yaml#L9-L28<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_elastisys_layer-2Dcharmscaler-2Dbase_blob_164d163b4104cc47dcb1a32019509ba3f61d91eb_config.yaml-23L9-2DL28&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=-zM-6M8Gm9wb76FlGmdzjfhWyz0QJijhCIjcjXyWzZg&e=> https://github.com/elastisys/layer-charmscaler#how-the-charmscaler-operates<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_elastisys_layer-2Dcharmscaler-23how-2Dthe-2Dcharmscaler-2Doperates&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=j_EzDICRdoTSt7j79uwQSprph4C540FPLPewq7F3UqE&e=> https://jujucharms.com/u/elastisys/<https://urldefense.proofpoint.com/v2/url?u=https-3A__jujucharms.com_u_elastisys_&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=3ACC8k5T6nIm5zvYDyLS4htBMnTcOiq_dfKDbxAbQVg&e=> https://github.com/elastisys/bundle-autoscaled-kubernetes<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_elastisys_bundle-2Dautoscaled-2Dkubernetes&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=hG46Sd4mwtaKNeTiIYRJX1CIPBz3ZkZ7o80N4_Ex6MA&e=> https://jujucharms.com/u/elastisys/autoscaled-kubernetes/bundle/0<https://urldefense.proofpoint.com/v2/url?u=https-3A__jujucharms.com_u_elastisys_autoscaled-2Dkubernetes_bundle_0&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=YCAhgtVPY6DJ7eTlFSDA9GfmjidxyWjTlJ3YxF-RWtw&e=> https://elastisys.com/cloud-platform-features/predictive-auto-scaling/<https://urldefense.proofpoint.com/v2/url?u=https-3A__elastisys.com_cloud-2Dplatform-2Dfeatures_predictive-2Dauto-2Dscaling_&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=83xErAbP0ETxBbG3VqRZcWO3j8t5Eyi0wHIHaRmw7bg&e=> You could build your own orchestrator with help of https://github.com/juju/python-libjuju<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_juju_python-2Dlibjuju&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=MR6x56apBjCa-VoM03QUaZSwWdxJuelbDf8PfKcyk_4&e=> depending on your criteria. The whole system could look as follows: telegraf with your own juju input plugin -> prometheus alerts -> orchestrator -> juju controller The telegraf plugin would query juju and/or MAAS periodically to determine the number of non-failed workers and send those metrics to prometheus. Googling a little bit, I have found somebody's http server https://github.com/imgix/prometheus-am-executor<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imgix_prometheus-2Dam-2Dexecutor&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=TQ9ehRRAU8r2eRXfqlw4BO6k-WnA3DfKufDyL8OlmLk&e=> that handles prometheus' alerts sent as HTTP requests. That kind of HTTP server could well hold your orchestration logic and use python-libjuju to perform the necessary add-unit-on-failure operations. To sum up: it is important to understand the difference between Juju and charms. Juju itself doesn't know anything about application-specific logic - charms do. Charm code is executed by Juju agents at certain events and this is where application-specific logic is actually executed. Any orchestration code must have admin access to the juju controller and contain subjective logic about how to scale-up or scale-down your application. https://github.com/juju/juju/blob/develop/doc/architectural-overview.md#juju-components<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_juju_juju_blob_develop_doc_architectural-2Doverview.md-23juju-2Dcomponents&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=-8ojuDvs0-_vhArtmT940F7fxQ8UDz9o9UU83yWSFtM&e=> I hope that helps. Best Regards, Dmitrii Shcherbakov Field Software Engineer IRC (freenode): Dmitrii-Sh On Fri, Sep 1, 2017 at 7:34 PM, Raghurama Bhat <[email protected]<mailto:[email protected]>> wrote: Any comments? Thanks, --Raghu From: <[email protected]<mailto:[email protected]>> on behalf of Raghurama Bhat <[email protected]<mailto:[email protected]>> Date: Thursday, August 31, 2017 at 8:07 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Newbie Question: How do I replace a machine in a deployed Model? Hi, I have a newbie question. I deployed a two node Kubernetes Core Cluster using Juju into a MaaS Setup. Now if I one of the Machine has a hardware failure, What is the process for replacing it with another machine? Does Juju controller monitor the cluster and request MaaS for a new machine if it detects one of the machines is gone? Even if this has to be done manually, I did not see a replace-machine option to Juju. Only add and remove units and machines. How does this work? Thanks, --Raghu -- Juju mailing list [email protected]<mailto:[email protected]> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ubuntu.com_mailman_listinfo_juju&d=DwMFaQ&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJyp031rXjhg&r=sfPGOSBRIiMxvWkZIf80KJUxsqXGMBLMd-Vuxb09BnI&m=KqEWCSwnbjOzxueuPssYUzxQ55zjZKw3qiQFxSNTKmM&s=ZrgJJID5yLOMULmVKEJUZcq8_jrZy36zP0kaVvIKfrA&e=>
-- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
