I wouldn't want to be in your shoes in a pre-snappy world... I'm amazed that Ubuntu still works so well in the ocean.
We found a way to mitigate most of the issues: run everything exclusively in LXC containers. This gave us the standard cloud image that all these Charms are being tested on. This approach had two issues: 1. Network: lxc containers can't connect to containers on other hosts and can't resolve each others hostnames. DNS might be a bigger issue than you think. Not a single big data framework can handle un-resolvable hostnames. 2. Reliability: - We experienced many crashed state servers on 1.x manual environments. - Random failure of the lxc template download[1] (This problem reappeared 1 week after closing the issue. We didn't reopen it because we started moving to MAAS). - Random failure of installing lxc packages on the host. At first I thought this was due to outdated host images, but this problem was intermittent, which doesn't make much sense.. - Fixes for 1. were hard to create, and didn't work reliable. Getting to the point where lxc was successfully installed and working was hard and unreliable. About 1/2 deploys failed. However, once an environment got there, it was very pleasant to work with. What I suggest is that you stop trying to make Juju work in 'the ocean' and focus the manual environment efforts on one thing: a multi-machine LXD provider. *Fix the LXD networking and DNS issues and tell everyone to only use LXD containers in a manual environment.* For many people, the manual provider is their starting point into the Juju world, and running everything in LXD containers is a very good starting point. [1] https://bugs.launchpad.net/juju-core/1.25/+bug/1610880 2016-11-28 16:55 GMT+01:00 Mark Shuttleworth <[email protected]>: > > Super difficult to document 'the ocean', there will always be fraying at > the edges that what worked on clouds fails in the manual case. > > Mark > > > On 28/11/16 15:49, Rick Harding wrote: > > That's very true on the items that are different. I wonder if we could > work with the CPC team and note the things that are assumed promises when > using cloud images so that it'd be easy to build a "patch" for manually > provisioned machines. If we know specific packages or configuration is > there on our images it should be do-able to help have some sort of > "manual-init" script that could try to bring things in line. > > Merlijn, do you have any notes on the changes that you were suffering > through? Was there anything that didn't fit the "using your own ubuntu > install vs a CPC certified image"? > > On Sun, Nov 27, 2016 at 1:26 AM John Meinel <[email protected]> > wrote: > >> From what I can tell, there are a number of places where these manual >> machines differ from our "standard" install. I think the charms can be >> written defensively around this, but its why you're running into more >> issues than you normally would. >> >> 1. 'noexec' for /tmp. I've heard of this, but as layer-ruby wants to >> build something, where *should* it build something. Maybe we could do >> something in /var, but it does seem like the intermediate files are all >> temporary (thus why someone picked /tmp). I don't have any details on >> layer-ruby >> 2. python-yaml not installed. Most of the places where we run juju >> uses 'cloud-init' in order to set up the machine for the first time, and >> I'm pretty sure cloud-init has a dependency on python-yaml (cause its how >> some of the cloud-init config is written). Again, charms can just include >> python-yaml as a dependency, I'm guessing they just didn't notice because >> all the other places they tested it was already there. >> >> John >> =:-> >> >> >> On Sun, Nov 27, 2016 at 4:45 AM, Merlijn Sebrechts < >> [email protected]> wrote: >> >> I feel you, James >> >> We've been battling with weird issues / compatibility problems with the >> manual provider on private infra for the past year. Just finding out where >> the problem is requires diving deep into the internals of Juju and the >> Charms. In the end, we patched our own servers heavily and had to patch >> ~30% of the Charms we tried. This slowed us down so much that we just gave >> up and moved to MAAS. We're having a lot less problems now.. >> >> >> >> 2016-11-27 0:03 GMT+01:00 James Beedy <[email protected]>: >> >> Was a bit flustered earlier when I sent off this email, I've looked a bit >> closer at each of the individual problems, thought I would report back with >> my findings. >> >> 1. Job for systemd-sysctl.service failed because the control process >> exited >> - This is an error I'm seeing when installing juju (not sure if this >> is adding to any other issues or not), didn't look into it much, but filed >> a bug here -> https://bugs.launchpad.net/juju/+bug/1645025 >> >> 2. ERROR juju.state database.go:231 using unknown collection >> "remoteApplications" >> - This seems to only exist in 2.0.1, installed from juju/stable ppa, >> when I reverted back to 2.0.0, this went away. >> >> Charm/Layer Issues >> >> 3. Problem with Ruby: ["env: './configure': Permission denied"] >> - Both of my charms were utilizing layer-ruby. When deployed to lxd, >> and EC2, I don't seem to get this error, but deploying on this >> private/dedicated infra doesn't like python running `./configure` I feel >> (could also be permissions on /tmp, but I tried moving the upacking and >> configuring to another dir, and still got this error). >> - Filed bug here -> https://github.com/battlemidget/juju-layer-ruby/ >> issues/12 >> - Removing layer-ruby was my fix here, this allowed my charms to >> deploy w/o error. >> >> 4. Elasticsearch >> - Seems the es charm can't find the yaml module (possibly a python3.5 >> thing)??? >> - Filed bug here -> https://bugs.launchpad.net/ >> charms/+source/elasticsearch/+bug/1645043 >> - My workaround here, just to get the app deployed, was to deploy >> elasticsearch to a lxd container on one of my hosts. Of course this isn't >> an answer for anything more then POC, but worked to allow me to >> deploy/troubleshoot the rest of my bundle. >> >> >> Aside from the remaining elasticsearch issue, I was able to get my stack >> deployed -> http://paste.ubuntu.com/23540146/ >> >> My earlier baffled and confused cry for help seems now just revolve >> around getting es to deploy. >> >> My apologies for reaching out in such a way earlier before diving into >> what was going on, hopefully we can work out whats going on with my infra >> <-> ES. >> >> Thanks >> >> -- >> Juju mailing list >> [email protected] >> Modify settings or unsubscribe at: https://lists.ubuntu.com/ >> mailman/listinfo/juju >> >> >> >> -- >> Juju mailing list >> [email protected] >> Modify settings or unsubscribe at: https://lists.ubuntu.com/ >> mailman/listinfo/juju >> >> >> -- >> Juju-dev mailing list >> [email protected] >> Modify settings or unsubscribe at: https://lists.ubuntu.com/ >> mailman/listinfo/juju-dev >> > > > >
-- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
