Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

Matt Foley Thu, 12 Jan 2017 18:28:14 -0800

Mike, could you try again on the image, please, making sure it is a simple 
format (gif, png, or jpeg)?  It got munched, at least in my viewer.  Thanks.


Casey, responding to some of the questions you raised:

I’m going to make a rather strong statement:  We already have a service “to 
intermediate and handle config update/retrieval”.  
Furthermore, it:
- Correctly handles the problems of distributed services running on multi-node 
clusters.  (That’s a HARD problem, people, and we shouldn’t try to reinvent the 
wheel.)
- Correctly handles Kerberos security. (That’s kinda hard too, or at least a 
lot of work.)
- It does automatic versioning of configurations, and allows viewing, 
comparing, and reverting historical configs
- It has a capable REST API for all those things.
It doesn’t natively integrate Zookeeper storage of configs, but there is a 
natural place to specify copy to/from Zookeeper for the files desired.

It is Ambari.  And we should commit to it, rather than try to re-create such 
features.
Because it has a good REST API, it is perfectly feasible to implement Stellar 
functions that call it.
GUI configuration tools can also use the Ambari APIs, or better yet be 
integrated in an “Ambari View”. (Eg, see the “Yarn Capacity Scheduler 
Configuration Tool” example in the Ambari documentation, under “Using Ambari 
Views”.)

Arguments are: Parsimony, Sufficiency, Not reinventing the wheel, and Not 
spending weeks and weeks of developer time over the next year reinventing the 
wheel while getting details wrong multiple times…

Okay, off soapbox.  

Casey asked what the config update behavior of Ambari is, and how it will 
interact with changes made from outside Ambari.
The following is from my experience working with the Ambari Mpack for Metron.  
I am not otherwise an Ambari expert, so tomorrow I’ll get it reviewed by an 
Ambari development engineer.

Ambari-server runs on one node, and Ambari-agent runs on each of all the nodes.
Ambari-server has a private set of py, xml, and template files, which together 
are used both to generate the Ambari configuration GUI, with defaults, and to 
generate configuration files (of any needed filetype) for the various Stack 
components.
Ambari-server also has a database where it stores the schema related to these 
files, so even if you reach in and edit Ambari’s files, it will Error out if 
the set of parameters or parameter names changes.  The historical information 
about configuration changes is also stored in the db.
For each component (and in the case of Metron, for each topology), there is a 
python file which controls the logic for these actions, among others:
- Install
- Start / stop / restart / status
- Configure

It is actually up to this python code (which we wrote for the Metron Mpack) 
what happens in each of these API calls.  But the current code, and I believe 
this is typical of Ambari-managed components, performs a “Configure” action 
whenever you press the “Save” button after changing a component config in 
Ambari, and also on each Install and Start or Restart.

The Configure action consists of approximately the following sequence (see 
disclaimer above :-)
- Recreate the generated config files, using the template files and the actual 
configuration most recently set in Ambari
o Note this is also under the control of python code that we wrote, and this is 
the appropriate place to push to ZK if desired.
- Propagate those config files to each Ambari-agent, with a command to set them 
locally
- The ambari-agents on each node receive the files and write them to the 
specified locations on local storage

Ambari-server then whines that the updated services should be restarted, but 
does not initiate that action itself (unless of course the initiating action 
was a Start command from the administrator).

Make sense?  It’s all quite straightforward in concept, there’s just an awful 
lot of stuff wrapped around that to make it all go smoothly and handle the 
problems when it doesn’t.

There’s additional complexity in that the Ambari-agent also caches (on each 
node) both the template files and COMPILED forms of the python files (.pyc) 
involved in transforming them.  The pyc files incorporate some amount of 
additional info regarding parameter values, but I’m not sure of the form.  I 
don’t think that changes the above in any practical way unless you’re trying to 
cheat Ambari by reaching in and editing its files directly.  In that case, you 
also need to whack the pyc files (on each node) to force the data to be 
reloaded from Ambari-server.  Best solution is don’t cheat.

Also, there may be circumstances under which the Ambari-agent will detect 
changes and re-write the latest version it knows of the config files, even 
without a Save or Start action at the Ambari-server.  I’m not sure of this and 
need to check with Ambari developers.  It may no longer happen, altho I’m 
pretty sure change detection/reversion was a feature of early versions of 
Ambari.

Hope this helps,
--Matt

================================================
From: Michael Miklavcic <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, January 12, 2017 at 3:59 PM
To: "[email protected]" <[email protected]>
Subject: Re: [DISCUSS] Ambari Metron Configuration Management consequences and 
call to action

Hi Casey, 

Thanks for starting this thread. I believe you are correct in your assessment 
of the 4 options for updating configs in Metron. When using more than one of 
these options we can get into a split-brain scenario. A basic example is 
updating the global config on disk and using the zk_load_configs.sh. Later, if 
a user decides to restart Ambari, the cached version stored by Ambari (it's in 
the MySQL or other database backing Ambari) will be written out to disk in the 
defined config directory, and subsequently loaded using the zk_load_configs.sh 
under the hood. Any global configuration modified outside of Ambari will be 
lost at this point. This is obviously undesirable, but I also like the purpose 
and utility exposed by the multiple config management interfaces we currently 
have available. I also agree that a service would be best.

For reference, here's my understanding of the current configuration loading 
mechanisms and their deps.

<image>

Mike


On Thu, Jan 12, 2017 at 3:08 PM, Casey Stella <[email protected]> wrote:

In the course of discussion on the PR for METRON-652
<https://github.com/apache/incubator-metron/pull/415> something that I
should definitely have understood better came to light and I thought that
it was worth bringing to the attention of the community to get
clarification/discuss is just how we manage configs.

Currently (assuming the management UI that Ryan Merriman submitted) configs
are managed/adjusted via a couple of different mechanism.

   - zk_load_utils.sh: pushed and pulled from disk to zookeeper
   - Stellar REPL: pushed and pulled via the CONFIG_GET/CONFIG_PUT functions
   - Ambari: initialized via the zk_load_utils script and then some of them
   are managed directly (global config) and some indirectly (sensor-specific
   configs).
      - NOTE: Upon service restart, it may or may not overwrite changes on
      disk or on zookeeper.  *Can someone more knowledgeable than me about
      this describe precisely the semantics that we can expect on
service restart
      for Ambari? What gets overwritten on disk and what gets updated
in ambari?*
   - The Management UI: manages some of the configs. *RYAN: Which configs
   do we support here and which don't we support here?*

As you can see, we have a mishmash of mechanisms to update and manage the
configuration for Metron in zookeeper.  In the beginning the approach was
just to edit configs on disk and push/pull them via zk_load_utils.  Configs
could be historically managed using source control, etc.  As we got more
and more components managing the configs, we haven't taken care that they
they all work with each other in an expected way (I believe these are
true..correct me if I'm wrong):

   - If configs are modified in the management UI or the Stellar REPL and
   someone forgets to pull the configs from zookeeper to disk, before they do
   a push via zk_load_utils, they will clobber the configs in zookeeper with
   old configs.
   - If the global config is changed on disk and the ambari service
   restarts, it'll get reset with the original global config.
   - *Ryan, in the management UI, if someone changes the zookeeper configs
   from outside, are those configs reflected immediately in the UI?*


It seems to me that we have a couple of options here:

   - A service to intermediate and handle config update/retrieval and
   tracking historical changes so these different mechanisms can use a common
   component for config management/tracking and refactor the existing
   mechanisms to use that service
   - Standardize on exactly one component to manage the configs and regress
   the others (that's a verb, right?   nicer than delete.)

I happen to like the service approach, myself, but I wanted to put it up
for discussion and hopefully someone will volunteer to design such a thing.

To frame the debate, I want us to keep in mind a couple of things that may
or may not be relevant to the discussion:

   - We will eventually be moving to support kerberos so there should at
   least be a path to use kerberos for any solution IMO
   - There is value in each of the different mechanisms in place now.  If
   there weren't, then they wouldn't have been created.  Before we try to make
   this a "there can be only one" argument, I'd like to hear very good
   arguments.

Finally, I'd appreciate if some people might answer the questions I have in
bold there.  Hopefully this discussion, if nothing else happens, will
result in fodder for proper documentation of the ins and outs of each of
the components bulleted above.

Best,

Casey

Re: [DISCUSS] Ambari Metron Configuration Management consequences and call to action

Reply via email to