> Glad you agree with me that this isn’t HBase scale… it’s clearly not. I would never suggest introducing HBase for something like this, but since it’s there.
Ah, gotcha. Misunderstood your statement. On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball < si...@simonellistonball.com> wrote: > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I > would never suggest introducing HBase for something like this, but since > it’s there. > > On the idea of using the Ambari RDBMS for the same basis of it being > there, I see your point. That said, it can be postgres, sql server, mysql, > maria, oracle… various. Yes we have an ORM, but those are not nearly as > magic as they claim, and upgrade / schema evolution of an RDBMS often > involves some sort of platform dependent SQL migration in my experience. I > would suggest that supporting that range of options is not a good idea for > us. The Ambari project also pretty much reserve the right to blow away that > infrastructure in upgrades (which is fair enough). So relying on there > being an RDBMS owned by another component is not something I would > necessarily say was a clean choice. > > Simon > > > On 2 Feb 2018, at 13:50, Nick Allen <n...@nickallen.org> wrote: > > > > I fall marginally on the side of an RDBMS. There is definitely a case to > > be made on both sides, but I'll point out a few things for the RDBMS. > > > > > > (1) Flexibility. Using an RDBMS is going to provide us with much greater > > flexibility going forward. We really don't know what the specific use > > cases will be, but I am willing to bet they are user-focused > (preferences, > > etc). The type of use cases that most web applications use an RDBMS for. > > > > > >> If anything I would like to see the current RDBMS dependency come out... > > > > (2) Don't we already have an RDBMS requirement for Ambari? That's a > > dependency that we do not control. > > > > > >> ... hbase seems a good option (because we already have it there, it > would > > be kinda crazy at this scale if we didn’t already have it) > > > > (3) In this scenario, the RDBMS would not scale proportionally with the > > amount of telemetry, it would scale based on usage; primarily the number > of > > users. This is not "big data" scale. I don't think we can make the case > > for HBase based on scale here. > > > > > >> We would also end up with, as Mike points out, a whole new disk > > deployment patterns and a bunch of additional DBA ops process > requirements > > for every install. > > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are > > enterprises and organizations that are already very familiar with RDBMS > > solutions and have the infrastructure in place to manage those. For > users > > that don't need HA/DR, just use the DB that gets spun-up with Ambari. > > > > > > > > > > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball < > > si...@simonellistonball.com> wrote: > > > >> Introducing a RDBMS to the stack seems unnecessary for this. > >> > >> If we consider the data access patterns for user profiles, we are > unlikely > >> to query into them, or indeed do anything other than look them up, or > write > >> them out by a username key. To that end, using an ORM to translate a a > >> nested config object into a load of tables seems to introduce complexity > >> and brittleness we then have to take away through relying on relational > >> consistency models. We would also end up with, as Mike points out, a > whole > >> new disk deployment patterns and a bunch of additional DBA ops process > >> requirements for every install. > >> > >> Since the access pattern is almost entirely key => value, hbase seems a > >> good option (because we already have it there, it would be kinda crazy > at > >> this scale if we didn’t already have it) or arguably zookeeper, but that > >> might be at the other end of the scale argument. I’d even go as far as > to > >> suggest files on HDFS to keep it simple. > >> > >> Simon > >> > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic < > michael.miklav...@gmail.com> > >> wrote: > >>> > >>> Personally, I'd be in favor of something like Maria DB as an open > source > >>> repo. Or any other ansi sql store. On the positive side, it should mesh > >>> seamlessly with ORM tools. And the schema for this should be pretty > >>> vanilla, I'd imagine. I might even consider skipping ORM for straight > >> JDBC > >>> and simple command scripts in Java for something this small. I'm not > >>> worried so much about migrations of this sort. Large scale DBs can get > >>> involved with major schema changes, but thats usually when the > datastore > >> is > >>> a massive set of tables with complex relationships, at least in my > >>> experience. > >>> > >>> We could also use hbase, which probably wouldn't be that hard either, > but > >>> there may be more boilerplate to write for the client as compared to > >>> standard SQL. But I'm assuming we could reuse a fair amount of existing > >>> code from our enrichments. One additional reason in favor of hbase > might > >> be > >>> data replication. For a SQL instance we'd probably recommend a RAID > store > >>> or backup procedure, but we get that pretty easy with hbase too. > >>> > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ceste...@gmail.com> wrote: > >>> > >>>> So, I'll answer your question with some questions: > >>>> > >>>> - No matter the data store we use upgrading will take some care, > >> right? > >>>> - Do we currently depend on a RDBMS anywhere? I want to say that we > >> do > >>>> in the REST layer already, right? > >>>> - If we don't use a RDBMs, what's the other option? What are the > pros > >>>> and cons? > >>>> - Have we considered non-server offline persistent solutions (e.g. > >>>> https://www.html5rocks.com/en/features/storage)? > >>>> > >>>> > >>>> > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <merrim...@gmail.com> > >> wrote: > >>>> > >>>>> There is currently a PR up for review that allows a user to configure > >> and > >>>>> save the list of facet fields that appear in the left column of the > >>>> Alerts > >>>>> UI: https://github.com/apache/metron/pull/853. The REST layer has > >> ORM > >>>>> support which means we can store those in a relational database. > >>>>> > >>>>> However I'm not 100% sure this is the best place to keep this. As we > >> add > >>>>> more use cases like this the backing tables in the RDBMS will need to > >> be > >>>>> managed. This could make upgrading more tedious and error-prone. Is > >>>> there > >>>>> are a better way to store this, assuming we can leverage a component > >>>> that's > >>>>> already included in our stack? > >>>>> > >>>>> Ryan > >>>>> > >>>> > >> > >> > >