Hey Ken, My advice if you fall into an error state with a bundle like this is to deploy it charm by charm (or use the deployment script provided), as those may manage race conditions between the charms.
Let me explain this a little more what we call a race condition. When deploying complex solutions, it happens that several charms try to "change" a relation at the same time. If setting the relation takes too long, it sometimes fails the next relation changes. In short, there is a race between 2 services trying to access the same resource at the same time and that specific resource can only be consumed once at a time. This is what is happening to you. How do we fix that? * First of all, we have a juju-deployer for bundles that is a lot more intelligent as the default deployment for bundles ( http://pythonhosted.org/juju-deployer/config.html). I encourage you to use this for bundles * You can also retry the hook by typing "juju resolved -r yarn-hdfs-master/0" which will tell Juju you fixed the issue and it should retry the previously failed hook. * If you're not familiar with the workload and a service broke, maybe the old "reboot" trick will fix your issue... "juju run --unit yarn-hdfs-master/0 sudo reboot" will do... Not guaranteed to work but... * In the end, this is all because of the charm not managing the status of his fellow charm in a relation. You can fill a bug vs. that charm and explain under which conditions you saw the issues, and/or you can also hack into the code yourself and try to fix it. In your case here, the problem is probably that the service is still down while trying to join. You may check the service port is up and running, and wait until it is before actually firing the hook. (This is a basic but common example). Hope it helps! Best, Samuel Best, Samuel -- Samuel Cozannet Cloud, Big Data and IoT Strategy Team Business Development - Cloud and ISV Ecosystem Changing the Future of Cloud Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> / Juju <https://jujucharms.com> [email protected] mob: +33 616 702 389 skype: samnco Twitter: @SaMnCo_23 On Thu, Feb 5, 2015 at 1:55 PM, Ken Williams <[email protected]> wrote: > > Hi all, > > I'm currently using the 'data-analytics-with-sql-like' bundle. > Sometimes it works fine but sometimes it deploys with an error > and I don't know how to fix it without doing 'destroy-environment' and > deploy again (which takes time). > > The error (in 'juju stat') is, > > yarn-hdfs-master/0: > agent-state: error > agent-state-info: 'hook failed: "namenode-relation-joined" for > compute-node:datanode' > agent-version: 1.21.1 > machine: "4" > > > When this error occurs, I can ssh onto yarn-hdfs-master but > I cannot 'hdfs dfs -put' any data onto hdfs. > > Is there any way I can fix this without destroying the environment > and deploying again? > > Thankyou for any help, please, > > > Ken > > > > -- > Juju mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju > >
-- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
