Re: feedback about juju after using it for a few months

Marco Ceppi Wed, 17 Dec 2014 15:02:43 -0800

Wow, what a great email and fantastic feedback. I'm going to attempt to
reply and address each item inline below.

I'm curious, what version of Juju are you currently using?

On Wed Dec 17 2014 at 5:25:08 PM Caio Begotti <caio1...@gmail.com> wrote:

> Folks, I just wanted to share my experience with Juju during the last few
> months using it for real at work. I know it's pretty long but stay with me
> as I wanted to see if some of these points are bugs, design decisions or if
> we could simply to talk about them :-)
>
> General:
>
> 1. Seems that if you happen to have more than... say, 30 machines, Juju
> starts behaving weirdly until you remove unused machines. One of the weird
> things is that new deploys all stay stuck with a pending status. That
> happened at least 4 times, so now I always destroy-environment when testing
> things just in case. Have anyone else seen this behaviour? Can this because
> of LXC with Juju local? I do a lot of Juju testing so it's not usual for me
> to have a couple hundreds of machines after a mont by the way.
>

LXC can get...flaky, especially depending on the power of your machine. I
haven't seen an issue running 35 LXC containers with Juju on my desktop but
it's got i7 processors and 32GB of RAM :)

We're adding code that will reap empty machines after a short period of
time. This will help save you with your case and others who are running in
the cloud and don't want to spend money on cloud providers for machines
doing nothing!

2. It's not reliable to use Juju in laptops, which I can understand why of
> course but just in case... if the system is suspended Juju will not recover
> itself like the rest of the system services. It looses its connection from
> its API apparently? Hooks fail too (resuming always seems to call
> hooks/config-changed)? Is this just with me?
>

This is something I'm actually working on addressing by adding `juju local
suspend` and `juju local resume` commands via a `juju-local` plugin:
https://github.com/juju-solutions/juju-local I hope to have this out for
the new year. I'll also be cramming more functionality to make using the
local provider much more reliable and easy.

> 3. The docs recommend writing charms in Python versus shell script.
> Compared to Python they are subpar enough that I'd recommend saying they
> are not officially supported then. It's quite common to have race
> conditions in charms written in shell script. You have to keep polling the
> status of things because if you just call deploys and set relations in a
> row they will fail, because Juju won't queue the commands in a logical
> sequence, it'll just run them dumbly and developers are left in the wild to
> control it. I'm assuming a Python charm does not have this problem at all?
>

So, shell charms are fine, and we have a quite a few that are written well.
We can discourage people from using them, but juju and charms is about
choice and freedom. If an author wants to write charms in bash that's fine
- we will just hold them to the same standard as all other charms.
Something we've been diligently working on is charm testing. We're nearing
the conclusion of the effort to add some semblance of testing to each charm
and run those charms against all substrates and architectures we support.
In doing so we can find poorly written charms and charms written well
(regardless of language of charm).

Polling is something all charms will do, but I will address this more later
on with your question about blocking on relation-get.

4. It's not very clear how many times hooks/config-changed runs to me, I'd
> just guess many :-) so you have to pay attention to it and write extra
> checks to avoid multiple harmful runs of this hook. I'd say the sequence
> and number of hooks called by a new deploy is not very clear based on the
> documentation because of this. Hmm perhaps I could print debug it and count
> the hits...
>

The sequence is pretty standard across all charms. The number of
invocations will always be 1 + N times. There is no guarantee on the number
of times a hook will execute. The standard sequence is as follows though:

$ juju deploy $charm

install -> config-changed -> start

$ juju set $charm key=val

config-changed

$ juju add-relation $charm:db $other_charm

db-relation-joined
db-relation-changed
(db-relation-changed everytime data on the relation wire changes)

In this case relation-changed will always run at least once, but typically
is execute more than once.

$ juju remove-relation $charm $other_charm

db-relation-departed
db-relation-broken

Again, these hooks may execute more than once, all hooks may execute more
than once. That's why hooks need to be idempotent.

$ juju upgrade-charm $charm

upgrade-charm
install

$ juju destroy-service $charm

stop

> 5. Juju should queue multiple deployment in order not to hurt performance,
> both of disk and network IO. More than 3 deployments in parallel on my
> machine makes it all really slow. I just leave Juju for a while and go get
> some coffee because the system goes crazy. Or I have to break up manually
> the deployments, while Juju could have just queued it all and the CLI could
> simply display it as "queued" instead. I know it would need to analyse the
> machine's hardware to guess a number different from 3 but think about it if
> your deployments have about 10 different services... things that take 20
> minutes can easily take over 1 hour.
>

This does severely affect performance on the local provider, but juju is
designed to run events asynchronously in an environment. File a bug/feature
request for this at http://bugs.launchpad.net/juju-core/+filebug to request
that LXC deployments be done serially.

> 6. There is no way to know if a relation exists and if it's active or not,
> so you need to write dummy conditionals in your hooks to work around that.
> IMHO it's hackish to check variables that are only non-empty during a
> relation because they will vanish anyway. A command to list the currently
> set relations would be awesome to have, both inside the hooks and in the
> CLI. Perhaps charmhelpers.core.services.helpers.RelationContext could be
> used for this but I'm not totally sure as you only get the relation data
> and you need to know the relation name in advance anyway, right?
>

There is a way to query if relations are ready outside of that relation's
hook. You will need to run a few commands to get to that point though.

First, `relation-ids` this will list the unique identifiers for each
relation a charm has. So if you had a relation named "db" you could query
it's REL_IDS but doing the following:

rel_ids=`relation-ids db`

You can then find the names of the remote units for each relation id by
doing the following

units=`relation-list -r $rel_id`

Finally, you can get key/values of what each unit provided using:

data=`relation-get -r $rel_id key $unit`

In a bash loop it would look something like this:

```
#!/bin/sh

rel_ids=`relation-ids db`

for rel_id in "$rel_ids"; do
  units=`relation-list -r $rel_id`
  for unit in "$units"; do
    data=`relation-get -r $rel_id key $unit`
  done
done
```

The CharmHelpers library has wrappers to help facilitate this as well.

7. When a hook fails (most usually during relations being set) I have to
> manually run resolved unit/0 multiple times. It's not enough to call it
> once and wait for Juju to get it straight. I have to babysit the unit and
> keep running resolved unit/0, while I imagined this should be automatic
> because I wanted it resolved for real anyway. If the failed hook was the
> first in a chain, you'll have to re-run this for every other hook in the
> sequence. Once for a relation, another for config-changed, then perhaps
> another for the stop hook and another one for start hook, depending on your
> setup.
>

What charm is causing this issue? This shouldn't happen, but presumably the
failure is due to data or something else not being ready, which is why it's
erroring. It sounds like the charm doesn't properly guard against data not
being ready, which I'll cover, again below.

8. Do we have to monitor and wait a relation variable to be set? I've
> noticed that sometimes I want to get its value right away in the relation
> hook but it's not assigned yet by the other service. So I'm finding myself
> adding sleep commands when it happens, and that's quite hackish I think?
> IMHO the command to get a variable from a relation should be blocking until
> a value is returned so the charm doesn't have any timing issues. I see that
> happening with rabbitmq-server's charm all the time, for instance.
>

No. When any hook is executed it's executed with a snapshot of the data
available at the time of execution of the hook. This ensures consistency
for the entire duration of the hook. Blocking on data isn't good as it'll
leave long running hooks and block the execution chain for a key that may
never be sent.

Instead, check for variables you need from the relation, and if they don't
exist yet simply `exit 0`. Juju will re-queue the hook to execute when data
on the wire is changed. IE: the remote unit finally runs the appropriate
`relation-set` line.

9. If you want to cancel a deployment that just started you need to keep
> running remove-service forever. Juju will simply ignore you if it's still
> running some special bits of the charm or if you have previously asked it
> to cancel the deployment during its setting up. No errors, no other
> messages are printed. You need to actually open its log to see that it's
> still stuck in a long apt-get installation and you have to wait until the
> right moment to remove-service again. And if your connection is slow, that
> takes time, you'll have to babysit Juju here because it doesn't really
> control its services as I imagined. Somehow apt-get gets what it wants :-)
>

You can now force-kill a machine. So you can run `juju destroy-service
$service` then `juju terminate-machine --force #machine_number`. Just make
sure that nothing else exists on that machine! I'll raise an issue for
having a way to add a --force flag to destroying a service so you can just
say "kill this with fire, now plz"

10. I think there's something weird about relation-set and relation-get
> between services when you add and remove relations multiple times. For
> example, the first time I set a relation to a Postgres charm I get a
> database back and my desired roles configured, but if I remove the relation
> and then add it back I only get the database settings. The roles parameter
> is missing setup, so I don't have the right permissions in the DB the
> second time I set the relation. Anyone has seen this too with other charms?
>

This is a bug in the PostgreSQL charm. I'd file a bug so the author is
aware of this. https://bugs.launchpad.net/charms/+source/postgresql

That's it, thank you for those who made it to the end :-D
>

Thank you so much for your usage and feedback of Juju thus far. We really
want to make a tool that works best for you and everyone else. You've
raised some good points, some things we're aware of and working on, some
things we can improve upon. Please continue to pass feedback as you
continue to use Juju and let us know anywhere else we can help improve!

Marco Ceppi

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: feedback about juju after using it for a few months

Reply via email to