Re: [DISCUSS] Ambari Integration

Justin Leet Wed, 21 Sep 2016 11:18:20 -0700

We could definitely replace some of it, but have not replaced anything for
this PR. Most changes in the PR are in metron-deployment/packaging/ambari/
+ some light surrounding work to make some stuff available that wasn't.
The Ansible stuff is basically untouched if not actually untouched.
Actually provisioning the base cluster is a separate activity that will
live alongside the install. What we want as an end state is definitely
something I'd love to hear discussion on, and one I'd expect to see more of
as we build on this.


For a little more context, basically the mpack moves node assignment,
managing services, configuration, etc. to Ambari.  Basically the goal is to
just make everything a lot easier to install and configure in the general
case, which is a pain point that I've heard repeatedly.  It's (relatively)
easy to spin up on AWS because there's Ansible for it, but spinning up
random machines and putting Metron on top is more painful.

If you've gone through the regular Ambari install of a service (e.g.
through the UI), it's basically
1) Where do you want to install components?
2) What configs do you want on them?
3) Install and start them up?

The mpack is just setting up Ambari to make doing that easier by defining
the services, exposing some of the configuration (and I'm sure there's more
that should be exposed).

Justin

On Wed, Sep 21, 2016 at 1:53 PM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> Thanks Justin,
>
> So this should just replace what is currently happening if you do the full
> deployment, but you have not tested it as such?
> I think the difference in the ASW deployment that I saw was how it set the
> nodes to roles through the script.  Sorry if I overstated it.
>
>
> On September 21, 2016 at 13:45:14, Justin Leet (justinjl...@gmail.com)
> wrote:
>
> Hi Otto,
>
> Couple things to dig into a bit.  Let me know if I stray off what your
> question is, but I think this should give you the answer.
>
> For the mpack, it's just taking a cluster without Metron and turning it
> into a cluster running Metron (regardless of the cluster itself was
> provisioned).  I wasn't clear about it in my last message, but testing on
> AWS wasn't really about making sure small_cluster configuration was
> compatible with the mpack changes.  It was about making sure that we could
> go from a cluster without Metron to one with Metron.  The deployment was on
> AWS more to ensure we had enough memory to actually run the various
> services (the Docker cluster was having issues before we'd dumped enough
> extra services).
>
> Someone with a little more experience could probably chime in here, but
> the current AWS install actually does use the small_cluster configuration
> if you look at the defaults.yml under amazon-ec2 in metron-deployment
> package.  The mpack setup is independent of the Ansible stuff for right
> now. How close together those get and live (especially because there
> definitely is some overlap) is definitely a more involved discussion.
>
> Thanks,
> Justin
>
>
> On Wed, Sep 21, 2016 at 9:55 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
>> Hi Justin,
>>
>> Are you testing this against the small_cluster configuration?  With the
>> full install ( install ambari etc ) as well as the AWS install?
>> The AWS install seems like it’s own path, and is essentially different
>> from small_cluster.
>>
>> I myself am interested in the whole boat deployment - where I’m providing
>> centos nodes with only os/ssh/host setups to be totally deployed.
>>
>> On September 21, 2016 at 09:41:05, Justin Leet (justinjl...@gmail.com)
>> wrote:
>>
>> Hi all,
>>
>> I opened up a PR at  <http://goog_1872877021>https://github.com/apache/i
>> ncubator-metron/pull/266 for everyone to take a look at and comment on.
>> For reference, the original JIRA is https://issues.apache.org/j
>> ira/browse/METRON-427
>>
>> It pretty much covers the MVP that Casey outlined and should give a
>> pretty good starting point for everyone to build on.
>>
>> There's more details on the ticket (and in the README in the code), but
>> I'll try to give the abbreviated version.
>>
>> The PR builds an mpack that sets up Kafka topics, a MySQL instance with
>> GeoIP data loaded, the Storm topologies (parsers, enrichment, and
>> indexing), and output to Elasticsearch and HDFS.  It also exposes
>> management and a lot of configuration through Ambari.  The sensors are NOT
>> managed by Ambari.  It includes some testing instructions for trying things
>> out.
>>
>> Additionally, this does not replace the current Ansible infrastructure.
>> There's definitely good discussion to be had around what interaction these
>> two approaches have.
>>
>> It also includes a set of limitations / caveats that we'll want to build
>> on as we expand out of the MVP.  I'll include them here so that everyone
>> has a good idea of what where the MVP ends (as of the PR as it stands right
>> now) and where people can contribute ideas or code if they have an interest.
>>
>>    - MySQL install should be optional (and allow for using an existing
>>    instance).
>>    - MySQL should not be installed on a node already running a MySQL
>>    instance (e.g. an Ambari Server using MySQL as its database).
>>    - There is currently no hosting for RPMs remotely. They will have to
>>    be built locally.
>>    - Colocation of appropriate services should be enforced by Ambari.
>>    See 'Installing Management Pack' section in the README for more details.
>>    - Storm's topology.classpath is not updated with the Metron service
>>    install and needs to be updated separately.
>>    - Several configuration parameters used when installing the Metron
>>    service could (and should) be grabbed from Ambari. Install will require
>>    them to be manually entered.
>>    - Need to handle upgrading Metron
>>
>>
>> Thanks,
>> Justin
>>
>> On Fri, Sep 16, 2016 at 11:32 AM, Justin Leet <justinjl...@gmail.com>
>> wrote:
>>
>>> I went ahead and created a Jira ticket mirroring Casey's discussion of
>>> the MVP.  Feel free to add anything of interest there, too.
>>>
>>> https://issues.apache.org/jira/browse/METRON-427
>>>
>>>
>>> Justin
>>>
>>>
>>> On Fri, Sep 16, 2016 at 9:39 AM, Justin Leet <justinjl...@gmail.com>
>>> wrote:
>>>
>>>
>>>> ---------- Forwarded message ----------
>>>> From: zeo...@gmail.com <zeo...@gmail.com>
>>>> Date: Thu, Sep 15, 2016 at 9:02 PM
>>>> Subject: Re: [DISCUSS] Ambari Integration
>>>> To: u...@metron.incubator.apache.org
>>>> Cc: dev@metron.incubator.apache.org
>>>>
>>>>
>>>> Of course I would still need a full list of the repos, and submit proxy
>>>> rules for the Ambari box, but happy to hear it will alleviate the need
>>>> for
>>>> making the scripts use proxies on the cluster nodes.
>>>>
>>>> Jon
>>>>
>>>> On Thu, Sep 15, 2016, 19:34 Nick Allen <n...@nickallen.org> wrote:
>>>>
>>>> > Jon - Installing Metron on an isolated network becomes much easier
>>>> with
>>>> > Ambari.  You would just mirror the required RPM repositories.  You
>>>> can then
>>>> > point Ambari to where your repo lives via the installation wizard.
>>>> I've
>>>> > done quite a few installs via Ambari on an isolated network and it
>>>> worked
>>>> > quite well.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Sep 15, 2016 at 6:50 PM, zeo...@gmail.com <zeo...@gmail.com>
>>>> > wrote:
>>>> >
>>>> >> First of all - very much looking forward to this approach.  I'm not
>>>> very
>>>> >> familiar with management packs, but I did read some of the
>>>> documentation in
>>>> >> the link you sent.
>>>> >>
>>>> >> Not sure if this is already included in a "minimum viable product,"
>>>> but
>>>> >> at some point I think there needs to be a method of specifying
>>>> proxies
>>>> >> and/or internal package repos.  I recently did a Metron 0.2.0 install
>>>> >> behind a proxy (hence METRON-409
>>>> >> <https://issues.apache.org/jira/browse/METRON-409>) and it look me a
>>>> >> semi-lengthy amount of time to (1) find all of the destinations I
>>>> needed to
>>>> >> request openings for in the proxy, and (2) modify the ambari scripts
>>>> to
>>>> >> appropriately use my proxies in the correct way.
>>>> >>
>>>> >> I also have a bit of a concern with upgrades and customizations in
>>>> >> general (Not just how it would work with mpacks).  I have not done
>>>> any of
>>>> >> this to date, but I have rebuilt and redeployed a couple of times
>>>> and I
>>>> >> needed to modify some of the metron code itself before build/deploy
>>>> >> (because of my concern with it getting overwritten on upgrade if I
>>>> just did
>>>> >> it directly on the cluster).  I would like to see a method of
>>>> putting in
>>>> >> install-specific files that modify or overwrite parts of the core
>>>> metron
>>>> >> stack, like changes to znodes, parsers, etc.
>>>> >>
>>>> >> Regarding not managing sensors with Ambari, I agree.  I run a large
>>>> bro
>>>> >> cluster and it is maintained via Puppet and various other mechanisms
>>>> - no
>>>> >> need for Ambari to bleed over in my case.
>>>> >>
>>>> >> Thanks for the great work.
>>>> >>
>>>> >> Jon
>>>> >>
>>>> >> On Thu, Sep 15, 2016 at 5:10 PM Casey Stella <ceste...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> Hi Everyone,
>>>> >>>
>>>> >>> I wanted to solicit some discussion around a feature that is fast
>>>> >>> approaching.  A major pain point in using Metron is installation.
>>>> Thus far
>>>> >>> our only approach to installation has been driven by the
>>>> developer's needs
>>>> >>> to construct a virtual environment to test out changes, which lead
>>>> us to
>>>> >>> either an ansible installation or a manual installation.
>>>> >>>
>>>> >>> Because we want to make sure that the installation of Metron is as
>>>> easy
>>>> >>> as possible, we have had some great contributions of an additional
>>>> >>> approach, installation via Apache Ambari directly.  Our ansible
>>>> scripts
>>>> >>> currently rely on Ambari blueprints to set up Hadoop on the cluster
>>>> that it
>>>> >>> is deploying on, so it is not a new dependency, but we're working
>>>> toward a
>>>> >>> full Ambari management pack
>>>> >>> <https://cwiki.apache.org/confluence/display/AMBARI/Manageme
>>>> nt+Packs>
>>>> >>> that will lay down the relevant topologies (parser, enrichment,
>>>> indexing),
>>>> >>> configs, bits and their infrastructural dependencies (ES and mysql)
>>>> and
>>>> >>> allow the topologies to be started and stopped as minimum viable
>>>> product.
>>>> >>>
>>>> >>> The beginnings of this have started with:
>>>> >>>
>>>> >>>    - Ambari Service Definitions for the Parser topologies
>>>> >>>    <https://github.com/apache/incubator-metron/pull/218>
>>>> >>>    - Ambari Service Definition for the Indexing Topology
>>>> >>>    <https://github.com/apache/incubator-metron/pull/222>
>>>> >>>    - Ambari Service Definition for Elasticsearch
>>>> >>>    <https://github.com/apache/incubator-metron/pull/223>
>>>> >>>
>>>> >>> There will be more to come in the near-term to realize that vision,
>>>> but
>>>> >>> we wanted to get some reactions.  Past minimum viable product, what
>>>> do you
>>>> >>> guys think we should have and how should it look?
>>>> >>>
>>>> >>> Currently we are treating the domain of the ambari installation as
>>>> from
>>>> >>> kafka to the indexes, which leaves the sensors unmanaged via
>>>> ambari.  Is
>>>> >>> that a good decision?
>>>> >>>
>>>> >>> Are there other pain points that you have had around installation
>>>> that
>>>> >>> you'd like to see addressed?
>>>> >>>
>>>> >>> The purpose of this discussion thread is to let you guys know that
>>>> we
>>>> >>> will soon have a new way to install metron, but also to understand
>>>> what the
>>>> >>> future requirements are so we, as a community, can address them.
>>>> >>>
>>>> >>> Best,
>>>> >>>
>>>> >>> Casey
>>>> >>>
>>>> >> --
>>>> >>
>>>> >> Jon
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Nick Allen <n...@nickallen.org>
>>>> >
>>>> --
>>>>
>>>> Jon
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS] Ambari Integration

Reply via email to