+1 I think going with HBase is a good approach for now. Thanks for laying out the pros and cons.
On Fri, Feb 9, 2018 at 3:46 PM, Ryan Merriman <merrim...@gmail.com> wrote: > I would like to bring this discussion to a conclusion and update the PR > accordingly. To clarify on whether we depend on an RDBMS right now, we do > but only for authentication which will probably be replaced at some point. > So the answer is not really. I personally agree with Simon and think we > should use HBase because this use case fits the data model and it's already > in our stack. I would add that with HBase we can move the schema evolution > complexity to the application layer and hide it from the user. This will > make upgrades easier which is my main point of contention. I also agree > with Nick in that I do think there may be a place for a RDBMS in the future > but we can always add it back. > > The 2 choices seems to be either an RDBMS or HBase. Here is a summary > based on comments in this discussion: > > RDBMS > - some are not too worried about schema evolution as the data model will > likely be simple > - avoiding having to alter tables when upgrading would be ideal > - works with ORM tools > - is flexible and could be useful for future use cases > > HBase > - might involve boilerplate code if not covered elsewhere in Metron > - key/value is good enough for user profile settings > - data replication for free > > Reading over this thread again I get the impression there is a slight > preference for HBase. Want to give people one more change to chime in or > argue the other solution. Let me know if I missed anything or didn't > include someone's argument. > > > > On Fri, Feb 2, 2018 at 8:24 AM, Nick Allen <n...@nickallen.org> wrote: > > > > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I > > would never suggest introducing HBase for something like this, but since > > it’s there. > > > > Ah, gotcha. Misunderstood your statement. > > > > > > > > On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball < > > si...@simonellistonball.com> wrote: > > > > > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I > > > would never suggest introducing HBase for something like this, but > since > > > it’s there. > > > > > > On the idea of using the Ambari RDBMS for the same basis of it being > > > there, I see your point. That said, it can be postgres, sql server, > > mysql, > > > maria, oracle… various. Yes we have an ORM, but those are not nearly as > > > magic as they claim, and upgrade / schema evolution of an RDBMS often > > > involves some sort of platform dependent SQL migration in my > experience. > > I > > > would suggest that supporting that range of options is not a good idea > > for > > > us. The Ambari project also pretty much reserve the right to blow away > > that > > > infrastructure in upgrades (which is fair enough). So relying on there > > > being an RDBMS owned by another component is not something I would > > > necessarily say was a clean choice. > > > > > > Simon > > > > > > > On 2 Feb 2018, at 13:50, Nick Allen <n...@nickallen.org> wrote: > > > > > > > > I fall marginally on the side of an RDBMS. There is definitely a > case > > to > > > > be made on both sides, but I'll point out a few things for the RDBMS. > > > > > > > > > > > > (1) Flexibility. Using an RDBMS is going to provide us with much > > greater > > > > flexibility going forward. We really don't know what the specific > use > > > > cases will be, but I am willing to bet they are user-focused > > > (preferences, > > > > etc). The type of use cases that most web applications use an RDBMS > > for. > > > > > > > > > > > >> If anything I would like to see the current RDBMS dependency come > > out... > > > > > > > > (2) Don't we already have an RDBMS requirement for Ambari? That's a > > > > dependency that we do not control. > > > > > > > > > > > >> ... hbase seems a good option (because we already have it there, it > > > would > > > > be kinda crazy at this scale if we didn’t already have it) > > > > > > > > (3) In this scenario, the RDBMS would not scale proportionally with > the > > > > amount of telemetry, it would scale based on usage; primarily the > > number > > > of > > > > users. This is not "big data" scale. I don't think we can make the > > case > > > > for HBase based on scale here. > > > > > > > > > > > >> We would also end up with, as Mike points out, a whole new disk > > > > deployment patterns and a bunch of additional DBA ops process > > > requirements > > > > for every install. > > > > > > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are > > > > enterprises and organizations that are already very familiar with > RDBMS > > > > solutions and have the infrastructure in place to manage those. For > > > users > > > > that don't need HA/DR, just use the DB that gets spun-up with Ambari. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball < > > > > si...@simonellistonball.com> wrote: > > > > > > > >> Introducing a RDBMS to the stack seems unnecessary for this. > > > >> > > > >> If we consider the data access patterns for user profiles, we are > > > unlikely > > > >> to query into them, or indeed do anything other than look them up, > or > > > write > > > >> them out by a username key. To that end, using an ORM to translate > a a > > > >> nested config object into a load of tables seems to introduce > > complexity > > > >> and brittleness we then have to take away through relying on > > relational > > > >> consistency models. We would also end up with, as Mike points out, a > > > whole > > > >> new disk deployment patterns and a bunch of additional DBA ops > process > > > >> requirements for every install. > > > >> > > > >> Since the access pattern is almost entirely key => value, hbase > seems > > a > > > >> good option (because we already have it there, it would be kinda > crazy > > > at > > > >> this scale if we didn’t already have it) or arguably zookeeper, but > > that > > > >> might be at the other end of the scale argument. I’d even go as far > as > > > to > > > >> suggest files on HDFS to keep it simple. > > > >> > > > >> Simon > > > >> > > > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic < > > > michael.miklav...@gmail.com> > > > >> wrote: > > > >>> > > > >>> Personally, I'd be in favor of something like Maria DB as an open > > > source > > > >>> repo. Or any other ansi sql store. On the positive side, it should > > mesh > > > >>> seamlessly with ORM tools. And the schema for this should be pretty > > > >>> vanilla, I'd imagine. I might even consider skipping ORM for > straight > > > >> JDBC > > > >>> and simple command scripts in Java for something this small. I'm > not > > > >>> worried so much about migrations of this sort. Large scale DBs can > > get > > > >>> involved with major schema changes, but thats usually when the > > > datastore > > > >> is > > > >>> a massive set of tables with complex relationships, at least in my > > > >>> experience. > > > >>> > > > >>> We could also use hbase, which probably wouldn't be that hard > either, > > > but > > > >>> there may be more boilerplate to write for the client as compared > to > > > >>> standard SQL. But I'm assuming we could reuse a fair amount of > > existing > > > >>> code from our enrichments. One additional reason in favor of hbase > > > might > > > >> be > > > >>> data replication. For a SQL instance we'd probably recommend a RAID > > > store > > > >>> or backup procedure, but we get that pretty easy with hbase too. > > > >>> > > > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ceste...@gmail.com> wrote: > > > >>> > > > >>>> So, I'll answer your question with some questions: > > > >>>> > > > >>>> - No matter the data store we use upgrading will take some care, > > > >> right? > > > >>>> - Do we currently depend on a RDBMS anywhere? I want to say that > > we > > > >> do > > > >>>> in the REST layer already, right? > > > >>>> - If we don't use a RDBMs, what's the other option? What are the > > > pros > > > >>>> and cons? > > > >>>> - Have we considered non-server offline persistent solutions > (e.g. > > > >>>> https://www.html5rocks.com/en/features/storage)? > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman < > merrim...@gmail.com> > > > >> wrote: > > > >>>> > > > >>>>> There is currently a PR up for review that allows a user to > > configure > > > >> and > > > >>>>> save the list of facet fields that appear in the left column of > the > > > >>>> Alerts > > > >>>>> UI: https://github.com/apache/metron/pull/853. The REST layer > > has > > > >> ORM > > > >>>>> support which means we can store those in a relational database. > > > >>>>> > > > >>>>> However I'm not 100% sure this is the best place to keep this. > As > > we > > > >> add > > > >>>>> more use cases like this the backing tables in the RDBMS will > need > > to > > > >> be > > > >>>>> managed. This could make upgrading more tedious and error-prone. > > Is > > > >>>> there > > > >>>>> are a better way to store this, assuming we can leverage a > > component > > > >>>> that's > > > >>>>> already included in our stack? > > > >>>>> > > > >>>>> Ryan > > > >>>>> > > > >>>> > > > >> > > > >> > > > > > > > > >