Thanks lots, Sam!! This is all very, very helpful advice.
Best Regards, Ken On 5 February 2015 at 14:59, Samuel Cozannet <[email protected]> wrote: > Hey Ken, > > My advice if you fall into an error state with a bundle like this is to > deploy it charm by charm (or use the deployment script provided), as those > may manage race conditions between the charms. > > Let me explain this a little more what we call a race condition. > When deploying complex solutions, it happens that several charms try to > "change" a relation at the same time. If setting the relation takes too > long, it sometimes fails the next relation changes. > In short, there is a race between 2 services trying to access the same > resource at the same time and that specific resource can only be consumed > once at a time. > This is what is happening to you. > > > How do we fix that? > * First of all, we have a juju-deployer for bundles that is a lot more > intelligent as the default deployment for bundles ( > http://pythonhosted.org/juju-deployer/config.html). I encourage you to > use this for bundles > * You can also retry the hook by typing "juju resolved -r yarn-hdfs-master/0" > which will tell Juju you fixed the issue and it should retry the previously > failed hook. > * If you're not familiar with the workload and a service broke, maybe the > old "reboot" trick will fix your issue... "juju run --unit yarn-hdfs-master/0 > sudo reboot" will do... Not guaranteed to work but... > * In the end, this is all because of the charm not managing the status of > his fellow charm in a relation. You can fill a bug vs. that charm and > explain under which conditions you saw the issues, and/or you can also hack > into the code yourself and try to fix it. In your case here, the problem is > probably that the service is still down while trying to join. You may check > the service port is up and running, and wait until it is before actually > firing the hook. (This is a basic but common example). > > Hope it helps! > Best, > Samuel > > > > > > Best, > Samuel > > -- > Samuel Cozannet > Cloud, Big Data and IoT Strategy Team > Business Development - Cloud and ISV Ecosystem > Changing the Future of Cloud > Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> / > Juju <https://jujucharms.com> > [email protected] > mob: +33 616 702 389 > skype: samnco > Twitter: @SaMnCo_23 > > On Thu, Feb 5, 2015 at 1:55 PM, Ken Williams <[email protected]> wrote: > >> >> Hi all, >> >> I'm currently using the 'data-analytics-with-sql-like' bundle. >> Sometimes it works fine but sometimes it deploys with an error >> and I don't know how to fix it without doing 'destroy-environment' and >> deploy again (which takes time). >> >> The error (in 'juju stat') is, >> >> yarn-hdfs-master/0: >> agent-state: error >> agent-state-info: 'hook failed: "namenode-relation-joined" for >> compute-node:datanode' >> agent-version: 1.21.1 >> machine: "4" >> >> >> When this error occurs, I can ssh onto yarn-hdfs-master but >> I cannot 'hdfs dfs -put' any data onto hdfs. >> >> Is there any way I can fix this without destroying the environment >> and deploying again? >> >> Thankyou for any help, please, >> >> >> Ken >> >> >> >> -- >> Juju mailing list >> [email protected] >> Modify settings or unsubscribe at: >> https://lists.ubuntu.com/mailman/listinfo/juju >> >> >
-- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
