Re: [mcollective-users] Experiences migration to choria and

jhoutman via mcollective-users Wed, 14 Jun 2017 11:42:59 -0700

Hi,

On Tuesday, June 13, 2017 at 3:21:34 PM UTC+2, R.I.Pienaar wrote:
>
> Thanks for the feedback and PRs you've already done, really appreciated. 
>
> Some replies inline 
>
> > *a single CA* 
> > *Our solution:* 
> > We created an alternative ssl directory and used the puppet subtools to 
> > generate and sign a request against a single CA we selected. Nats and 
> Mco 
> > use this directory for their certificates. 
> > On the systems that already used that puppet-ca we just created a 
> > symlink. 
>
> Fairly unavoidable I am afraid, if you want PKI this is what you get. 
> I've learned from the old security model mcollective had that people 
> just cant get their mind around 2 models.  They already grok PKI from 
> Puppet, so we need to behave as close as possible to what Puppet does to 
> keep things understandable for users. 
>
> However conceptually if you thought about your MCollective in terms of 
> Federation then each small collective is the domain of its own CA and 
> the Choria Federation system can do the work of bridging from CA to CA 
> for you. 
>
> This is possible because: 
>
> 1: Your federation broker could have a 'super user' cert.  This means it 
> can make mco requests on behalf of other users. 
>
> 2: federation brokers sits between collective boundaries 
>
> So a federation broker can be made that takes a incoming request from 
> bob.mcollective signed by ca.ldn and repackage the request using its 
> super user cert into PKI structures signed by ca.nyc with the caller set 
> in a way to identify the foreign CA. 
>
> The federation broker then has the role of validating the incoming 
> requests are valid and signed by the federation side CA and it then is 
> the only entity capable of making requests from the Federation into 
> member collectives using the member collectives own CA. 
>
> Already we have federation, already the security module supports this 
> super user concept. 
>
> What we dont have is the feature to bridge the CA realms, I anticipated 
> this will be needed at some point but have not had reason to implement 
> it yet. 
>
> Federation will be rewritten soon into the new Choria broker which will 
> incorporate transport, federation and discovery proxy all into a single 
> network service.  More on that below. 
>
> I'll try to keep this in mind then, but not sure how high this is on my 
> list of things tbh, it's something I'd develop on commission though for 
> sure. 
>
> If the concept of Mcollective Federation is new to you read 
> http://choria.io/docs/federation/ 
> <http://www.google.com/url?q=http%3A%2F%2Fchoria.io%2Fdocs%2Ffederation%2F&sa=D&sntz=1&usg=AFQjCNEU0eh6IthrvNiun6YeBfzfuednBw>
>  
>
>
Federation was one of our first attempted solutions, but it did not work, 
as you said, the federation could not bridge CA realms. 
We don't need it anymore, but some things to keep in mind when you work on 
this:
- In our setup a collectives spans CA-reams.  So the test collective 
consists of nodes in both CA-realms. 
  This is probably fairly uncommon and we could have worked around this 
easily.
- More common (I think); is the use of multiple CA-realms.  
  I know of a few companies that do have a puppet 3 and puppet 4 setup, 
with seperate CA realms.
   
The documentation could be a little clearer on this.



 

> > *Mcollective error handling* 
> > The rundeck system, exposes only a clean environment when executing a 
> > command. 
> > This means: 
> > that the HOME environment variable was not set 
> > the certificate was always <hostname>.mcollective while we wanted to use 
> > something else. (see 
> > 
> https://github.com/choria-io/mcollective-choria/blob/master/lib/mcollective/util/choria.rb
>  
> > line 623) 
> > The error handling was minimal such that we did not see any errors while 
> > executing the command through rundeck. It took a few hours to figure out 
> > what the problem was. 
> > These are also the kind of situations that are hard to test until you 
> > actually start migrating, as such they have a huge impact on the 
> > maintenance window required. 
> > 
> > *Our solution: * 
> > We have created a wrapper script and placed that first and foremost in 
> > the 
> > PATH. This script makes sure that MCOLLECTIVE_CERTNAME and HOME are set 
> > to 
> > the right values. 
> > This script also forces our human users to use the collectives to limit 
> > the 
> > load on the system. 
> > 
>
> yes as you say its expand_path, unavoidable, those vars have to be set. 
>
> Yeah not an issue, but i'll look into making it more obvious when they are 
missing. 
 

> > *A single client configuration* 
> > During the migration we planned to migrate to choria on a per 
> environment 
> > basis. The module only allows a single configuration, so from that 
> > perspective it was all environments or nothing. 
> > 
> > *Our solution:* 
> > We are maintaining a configuration file per environment: client_tst.cfg, 
> > client_acc.cfg, etc. 
> > The (above mentioned) wrapper selects the configuration file based on 
> the 
> > specified collective, unless explicitly specified. 
>
> you can still have ~/.mcollective and you can still pass --config, I'd 
> be reluctant to add too much magic to that where it would somehow pick a 
> per environment config automagically tbh, so your wrapper sounds like a 
> fine idea.  The config handling is a absolute train wreck already, more 
> magic is bad :) 
>

I agree, lets not make this magic.  A wrapper is the solution for this, or 
explicit calls with --config. 
Just arguing that it would be nice to have support in the module maintain 
multiple client configs. 

 

> But the federation solution outlined above will make this not needed. 
>
> I don't know if agree with this.  Would the federation have helped with 
our split activemq / nats setup? 
It was not clear from the documetation that the federation setup could 
cross messaging backends.

And there is still the situation where you might be migrating to the go 
implementation you mentioned below. 
Although you could make those compatible, a situation where one does not 
need to depend on that kind of integration between implementations is ideal.
Just switching client config files to start using the new setup is simple, 
robust and quick.

 > *A single server per node* 

> > 
> > The code only allows for the setup of a single mcollective server 
> process 
> > per node. This means that it is rather difficult to do rollouts without 
> > downtime. 
> > The required downtime of the stack would have been greatly reduced if it 
> > had been easy to run two mcollective agents, one pointing to active-mq 
> > the 
> > other to nats. 
> > Our wrapper could use the mcollective config, while we prepared and 
> > thoroughly tested the switch to nats. 
> > 
> > *Our solution:* 
> > We just went per environment and took the downtime. 
>
> It can make any config file you want, so you're welcome to make multiple 
> server.cfg files, you'll have to take care of the init system though as 
> that is done by the Puppet packaging. (but see below about a new daemon 
> where I will have to) 
>
> See for example the mcollective_choria::federation_broker defined type 
>
> Your roles/profiles could use the utilities to solve your problem. 
>

Great, I obviously missed that. We will be looking into it.  I dont mind if 
we have the manage the extra daemons from the roles and profiles. 

If people in the community are having similar trouble setting up a good 
testing environment, we should consider a page on the topic in the docs.
 

> > *Our intended follow-up actions* 
> > Now that we have almost completed migrating our systems, we plan to 
> spend 
> > a 
> > little time to improve our situation in the future. We are considering 
> > the 
> > following changes: 
> > 
> > 
> >    - Improve the failure handling when determining the certificate, 
> >    especially the dependency on the HOME environment variable. (We 
> >    suspect this is due to needing to expand '', e.g. 
> >    File.expand_path("/.puppetlabs/etc/puppet/ssl") in 
> >    mcollective/util/choria.rb) 
>
> yes exactly 
>
> >    - Allow the specification of the user certificate in the client 
> config 
> >    for a non-root account. (no need to depend on MCOLLECTIVE_CERTNAME) 
>
> conceptually I want choria to not need per client config files - ie. 
> ~/.mcollective but I can see the need for custom config files so that 
> sounds fine to me. 
>

Ideally I would agree, but the environment variables are just a different 
type of obscure config. 
Better make things explicit, in our opinion there was too much "magic" with 
the environment variables. 

>    - Support for multiple server configurations/instances on a single 
> >    node 
>
> As above, there already.  The module doesnt manage the OS init files and 
> I am not keen to do that either so the situation as is should stay for 
> now. 
>

Agreed, we will do the init files from the profile. 
 

> Do you guys think that some of these changes can be accepted upstream or 
> > do you have better idea's for addressing some of our concerns? 
> > Some of these issues might have been caused by mis-understandings or 
> > missing documentation, then we would like to help with solving that. 
>
> As above, I'd definitely be keen on some of these improvements and also 
> the other stuff we spoke about like managing the entire config file. 
>

Yeah forgot about that, the entire config file would be great. 
We have been bitten by that a few times already. 
 

> Medium term the following will happen given that Puppet Inc is refusing 
> to cooperate on letting me maintain mcollective: 
>
> A `choria` command will be written with the following features: 
>
> It will replace your mcollectived, it will be backward compatible and 
> agents/clients will continue to work.  It will be written in Go which 
> means it will be much lighter and faster.  RPC stuff will be forked so 
> you wont get the huge memory ballooning due to puppet libs. 
>

Yeah. GO!
Wouldn't have minded helping with that. But time seems to be in short 
supply.

It will incorporate in it the NATS broker so setting up the entire NATS 
> cluster will just be 3 lines in the mcollective config files. 
>

Is that wise?  Seperation of concerns and such.. 
I much rather have those separated, leaves free to follow upstream nats 
with requiring new builds from you.
Or use a fork of nats if you want to. 
The development of this would also not have been easy if activemq was 
bundled with mco. 
 

It will incorporate the Federation Broker to work in process with the 
> NATS brokers so its a much closer, faster and more reliable cooperation 
> between these parts.  They are one thing in effect. 
>

But also harder to implement the Cross CA-realm feature.
Or any feature where you might want to cross different messaging backends. 
 Nats and A-MQ f.e.
 

> Thus you have 1 binary to deploy which will effectively bring you 
> mcollective 3 - called choria - with a compatibility layer to older 
> mcollective 2.  It'll live along side puppet-agent and use that to 
> deliver the mcollective 2 layer as well as the clients which will remain 
> in ruby. 
>
> on the other side, that is nice and simple... 

Writing new agents will become a lot easier, I have some pretty sweet 
> prototypes around this. 
>

Care to share?
 

> On the choice of Go, unavoidably this will mean we can run on fewer 
> platforms
>

How so? Go can compile to anything decent. I think everything that runs 
puppet is covered.



I was wondering about those user certificates.
What is the advice for managing them across nodes?


with regards,

Jos

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"mcollective-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mcollective-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [mcollective-users] Experiences migration to choria and

Reply via email to