Glad you agree with me that this isn’t HBase scale… it’s clearly not. I would never suggest introducing HBase for something like this, but since it’s there.
On the idea of using the Ambari RDBMS for the same basis of it being there, I see your point. That said, it can be postgres, sql server, mysql, maria, oracle… various. Yes we have an ORM, but those are not nearly as magic as they claim, and upgrade / schema evolution of an RDBMS often involves some sort of platform dependent SQL migration in my experience. I would suggest that supporting that range of options is not a good idea for us. The Ambari project also pretty much reserve the right to blow away that infrastructure in upgrades (which is fair enough). So relying on there being an RDBMS owned by another component is not something I would necessarily say was a clean choice. Simon > On 2 Feb 2018, at 13:50, Nick Allen <n...@nickallen.org> wrote: > > I fall marginally on the side of an RDBMS. There is definitely a case to > be made on both sides, but I'll point out a few things for the RDBMS. > > > (1) Flexibility. Using an RDBMS is going to provide us with much greater > flexibility going forward. We really don't know what the specific use > cases will be, but I am willing to bet they are user-focused (preferences, > etc). The type of use cases that most web applications use an RDBMS for. > > >> If anything I would like to see the current RDBMS dependency come out... > > (2) Don't we already have an RDBMS requirement for Ambari? That's a > dependency that we do not control. > > >> ... hbase seems a good option (because we already have it there, it would > be kinda crazy at this scale if we didn’t already have it) > > (3) In this scenario, the RDBMS would not scale proportionally with the > amount of telemetry, it would scale based on usage; primarily the number of > users. This is not "big data" scale. I don't think we can make the case > for HBase based on scale here. > > >> We would also end up with, as Mike points out, a whole new disk > deployment patterns and a bunch of additional DBA ops process requirements > for every install. > > (4) Most users that need HA/DR (and other 'advanced stuff'), are > enterprises and organizations that are already very familiar with RDBMS > solutions and have the infrastructure in place to manage those. For users > that don't need HA/DR, just use the DB that gets spun-up with Ambari. > > > > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball < > si...@simonellistonball.com> wrote: > >> Introducing a RDBMS to the stack seems unnecessary for this. >> >> If we consider the data access patterns for user profiles, we are unlikely >> to query into them, or indeed do anything other than look them up, or write >> them out by a username key. To that end, using an ORM to translate a a >> nested config object into a load of tables seems to introduce complexity >> and brittleness we then have to take away through relying on relational >> consistency models. We would also end up with, as Mike points out, a whole >> new disk deployment patterns and a bunch of additional DBA ops process >> requirements for every install. >> >> Since the access pattern is almost entirely key => value, hbase seems a >> good option (because we already have it there, it would be kinda crazy at >> this scale if we didn’t already have it) or arguably zookeeper, but that >> might be at the other end of the scale argument. I’d even go as far as to >> suggest files on HDFS to keep it simple. >> >> Simon >> >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <michael.miklav...@gmail.com> >> wrote: >>> >>> Personally, I'd be in favor of something like Maria DB as an open source >>> repo. Or any other ansi sql store. On the positive side, it should mesh >>> seamlessly with ORM tools. And the schema for this should be pretty >>> vanilla, I'd imagine. I might even consider skipping ORM for straight >> JDBC >>> and simple command scripts in Java for something this small. I'm not >>> worried so much about migrations of this sort. Large scale DBs can get >>> involved with major schema changes, but thats usually when the datastore >> is >>> a massive set of tables with complex relationships, at least in my >>> experience. >>> >>> We could also use hbase, which probably wouldn't be that hard either, but >>> there may be more boilerplate to write for the client as compared to >>> standard SQL. But I'm assuming we could reuse a fair amount of existing >>> code from our enrichments. One additional reason in favor of hbase might >> be >>> data replication. For a SQL instance we'd probably recommend a RAID store >>> or backup procedure, but we get that pretty easy with hbase too. >>> >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ceste...@gmail.com> wrote: >>> >>>> So, I'll answer your question with some questions: >>>> >>>> - No matter the data store we use upgrading will take some care, >> right? >>>> - Do we currently depend on a RDBMS anywhere? I want to say that we >> do >>>> in the REST layer already, right? >>>> - If we don't use a RDBMs, what's the other option? What are the pros >>>> and cons? >>>> - Have we considered non-server offline persistent solutions (e.g. >>>> https://www.html5rocks.com/en/features/storage)? >>>> >>>> >>>> >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <merrim...@gmail.com> >> wrote: >>>> >>>>> There is currently a PR up for review that allows a user to configure >> and >>>>> save the list of facet fields that appear in the left column of the >>>> Alerts >>>>> UI: https://github.com/apache/metron/pull/853. The REST layer has >> ORM >>>>> support which means we can store those in a relational database. >>>>> >>>>> However I'm not 100% sure this is the best place to keep this. As we >> add >>>>> more use cases like this the backing tables in the RDBMS will need to >> be >>>>> managed. This could make upgrading more tedious and error-prone. Is >>>> there >>>>> are a better way to store this, assuming we can leverage a component >>>> that's >>>>> already included in our stack? >>>>> >>>>> Ryan >>>>> >>>> >> >>