The reason I posed this question to the community is because I started to recognize some of the shortcomings of doing this solely through Ambari, as you and Nick have pointed out. I think an Ambari view over the management UI is a great idea. And I'd love to see us provide a more robust mechanism for loading these enrichments via the management UI. As you said, perhaps Ambari could be used to manage the ZK config around active enrichments/locations (the "USE" part of it) while the management UI is used for actually loading and managing the enrichments themselves?
On Fri, Feb 24, 2017 at 8:12 AM, Casey Stella <[email protected]> wrote: > Late to chime in here, but I feel that we have discussed Ambari's role > before and I think we should probably clarify, as a community a few things > with regards Ambari vs a management UI built around the REST PR currently > under review. (I promise, I will get to the topic at hand eventually ;) : > > - Where functionality should live > - Who is responsible for what > > I will now make a couple (possibly controversial) statements (some of > which) we have actually discussed prior to this on the dev list: > > > - I view Ambari as managing the install and the static configuration for > Metron. For us, this would include zookeeper configs as well as > topology > configuration. This would be the persistent store of truth. > - I view Zookeeper to be our runtime configuration store for the > topologies. > > > - I view a management UI (and the Stellar Shell) as managing > functionality for interacting with the system. Where it changes > configuration, it must go through Ambari. > - I believe the management UI should be exposed as an ambari view > > As such, I see the importation and management of enrichments, which is a > data task, to be squarely in the purview of the management UI, whose job is > the care and feeding of the data. That being said, any configuration > changes to USE the enrichment should at least be routed through ambari, but > should be managed in the UI. > > Now the question becomes, should we have enrichment collateral (I'm > including both hbase as well as geo or anything else we have) loaded at > install-time. I would argue that we should not. Rather, we should design > the management UI so that the enrichments can be added easily, with a > wizard to enable the use of the enrichment via stellar for a sensor > > On that topic, I think we are doing too much as part of our install. I > would argue that we shouldn't pre-load even the geo data or depend on it > for the default parsers. > > Casey > > > > On Tue, Feb 21, 2017 at 6:31 PM, Michael Miklavcic < > [email protected]> wrote: > > > With the work committed in > > https://github.com/apache/incubator-metron/pull/445 and > > https://github.com/apache/incubator-metron/pull/432, we now have a > robust > > and flexible means to import enrichment sources and transform their > > contents as they are inserted into HBase. One of the main motivators for > > this new functionality was to add the ability to load top domain rankings > > from sources such as Alexa. The proposal is to make this type of > enrichment > > a top-level feature in Metron by introducing it to the Ambari management > UI > > as a configurable set of properties in the MPack install. This comes with > > some options and challenges in how we want to manage the configurations, > > which I will outline below. > > > > *Use cases:* > > > > - Single load of top domains file > > - Re-loading top domains file - need to be able to cleanup properly > > - Cleaning up/deleting old enrichment data (this is a general feature > > that we currently lack - I think it is worth a separate Jira/PR for > > creating a MapReduce job that enables cleanup to occur). > > - Modifying default top domains file source - there are other options > > besides Alexa. And users may want to load a file from local URI since > > many > > data centers do not have direct access to the internet. > > - Ability to modify the default extractor config JSON and tune the > > Stellar transformations for both the value and indicator transforms. > > Allows > > more flexible handling of data based on other sources. > > - Loading multiple top domains source enrichments. (Maybe a separate > PR > > for this if we even think it would be useful) > > - Updating the top domain enrichment - This needs to be an atomic > > operation in order to prevent incorrect data. > > - Rolling back to an older version of the top domains enrichment. Also > > needs to be atomic. > > - Ability to schedule an enrichment load on schedule - we would like > to > > defer this to an external scheduling mechanism, e.g. cron or Control > M. > > The > > enrichment loading system should have the necessary features to enable > > this > > type of automation without data integrity issues. > > > > *Considerations:* > > > > - As mentioned above, we want to add this feature to the Ambari MPack. > > This requires at least 2 parameters to work. We need the ability to > > specify > > a URI as well as an extractor config. > > - How do we want to manage the extractor config? The most obvious > > solution is to provide a text field in Ambari with a default JSON > > config. > > When a load is initiated, Ambari would place a fresh copy of the > > extractor > > config in the /tmp/ directory. This is an ephemeral file that isn't > > needed > > other than during a load. > > - It seems easy enough to have the load occur during the initial > > install, however subsequent loads would require a different workflow. > > How > > do folks feel about adding a set of dropdown options in the Ambari UI > > for > > loading, updating, and deleting the top domains enrichment? I believe > we > > are doing something similar for the ElasticSearch templates currently. > > - In the case of atomic operations for updates and rollbacks, I > propose > > we add a property to Zookeeper that is reference-able in the > enrichment > > itself. The idea would be to create a "top-domains" property in ZK > that > > points to an enrichment key with a load timestamp associated with it, > > e.g. > > top-domains_20170221042000. This would also allow a mapreduce job to > be > > written that cleans up old enrichments. Another option is to create a > > new > > table in HBase if/when you update the enrichment and change the > > enrichment > > config manually. Deleting an old enrichment would simply be a matter > of > > dropping the table in HBase. A relevant discussion of the tradeoffs of > > having many small tables versus 1 large table can be found here - > > http://grokbase.com/t/hbase/user/11bjbdw94q/multiple- > > tables-vs-big-fat-table > > - In order to update or rollback an enrichment as mentioned above, we > > would also ideally provide a mechanism for changing the rowkey pointed > > to > > by the enrichment. > > > > In summary of the use cases and considerations above, this boils down to > > how we'd like to leverage Ambari here. Do we want Ambari to handle only > the > > initial install/load and have end users be responsible on an ongoing > basis > > for updates (users would be responsible for copying or distributing the > > extractor_config.json for instance), or do we want to enable Ambari to > > manage the configuration ongoing and enable functionality for reloading, > > updating, and rollback? > > > > Best, > > Mike > > >
