Re: [jclouds/jclouds-labs] vagrant: remove terminated nodes cache (#357)

Svet Thu, 02 Feb 2017 12:16:03 -0800

TL;DR - We should keep the registry and expire the status only. Could save some 
of it in the machine description yaml.


---

That's correct @nacx. We see only machines we've created (with the exception of 
the initial load).

The provider serves the following purposes:
  1. holds information that's not available elsewhere - hostname and IPs
  2. cache of available machines
  3. cache of the machine state

To expand on each one:
  1. Hostname and IPs of the machine are fetched on boot using vagrant's 
provisioning scripts. There are a couple of reasons for that. First vagrant 
calls are very expensive (relatively for a local call). Each exec takes at 
least several seconds. Integrating them with the boot saves a couple of ssh 
calls. Second the windows winrm transport is available only through the 
provisioning scripts. Can't call "vagrant powershell <cmd>" later. This makes 
the information available only when creating the machine.
Possible solutions:
  * Store those in the yaml file that's already being created, containing the 
machine specs to be passed to the `Vagrantfile` script. We are already storing 
some additional info in there like `imageId` and `hardwareId`. What I don't 
like about it is that the information could change so those values would become 
stale.
  * Query the machine when we need them. Not possible with Windows; not 
possible if the machine is halted.
  * Keep them as is (in memory only). When the process stops the information is 
lost. On a new start if doesn't know the hostname and IPs. Depending on the 
intended usage of the provider could be enough.
Possibly a combination of the above would work best, depending on the OS. 
Perhaps coupled with a refreshing mechanism.
The file could store other information like the tags and the metadata.

2. Existing machines list can be reconstructed easily - just listing the files 
on the disk. Here's the place to mention that machines created by jclouds 
follow some conventions. Manually created machines are not considered. The 
provider creates a `yaml` file describing the requirements and some meta, then 
a generic `Vagrantfile` reads those and creates a machine based on them. 
I'd say the provider is only interested in machines it creates though. Or 
machines created from previous runs. That's a local "service" and no concurrent 
modifications of the machines is expected. There could be parallel processes 
running the provider but still each one would manage its own machines. I'd even 
strongly discourage that since virtualbox (vboxmanage) has problems when it's 
executed in parallel. Currently the vagrant bindings explicitly serialise execs 
of `vagrant`.
So while it's possible to make this "live" I don't see it being useful.

3. Turns out we don't ever need to query the status of a machine. The key here 
is that the vagrant commands are synchronous. If `vagrant up` completes 
successfully then the status is `RUNNING`. If it fails an exception propagates 
and signals an error. This makes it possible to save on expensive state 
polling. It gets more obvious when several machines are spun up in parallel. 
Since `vagrant` commands are executed sequentially a `vagrant up` would block 
other `vagrant status` commands for quite a while.
Possible improvements:
  * time out the status value, refreshing it after some period on request


The registry allows us to really streamline machine creation. All it takes is a 
single `vagrant up`. It needs around a minute to return a usable machine 
(obviously depends on the image and is dominated by OS boot). Whereas before 
introducing the registry it would take at least 50% more. And it gets worse 
with the more machines being created in parallel.

And now to answer your questions :)

> we won't see the change. Is this correct? 

That's correct.

> And is this what we want? 

I think so. We are interested only in machines created by the same process. In 
future we could add  support for adopting external machines but don't see much 
value in that.

> (Is it that expensive to list the existing machines, that we need a cache?).

Listing is pretty quick, getting the status is not instantaneous. Worst is that 
long commands block shorter commands.
Vagrant explicitly disables parallel provisioning for machines in the same 
`Vagrantfile` for Virtualbox. When testing I also found that there are frequent 
and unexplained failures if I let it execute in parallel for different machines 
in different `Vagrantfile`s. That's why currently all execs are serialised.

> If it turns the cache is necessary, I'd suggest to add an expiration to the 
> the node memoized supplier according to the session timeout property.

Not absolutely needed, but very much welcome. Not quite convinced we should 
refresh machine list while running - wouldn't want to mess with another's 
process machines. Don't think Virtualbox will be happy with that. Agree we 
should expire the status.

---

The provider makes no assumptions about the underlying virtualisation used. By 
default that's Virtualbox and that's what I've been testing with. There's 
nothing preventing us from adding support for other providers like `libvirt` or 
`xhyve`. Even `vagrant-aws` ;).


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jclouds/jclouds-labs/pull/357#issuecomment-277070357

Re: [jclouds/jclouds-labs] vagrant: remove terminated nodes cache (#357)

Reply via email to