+1 I think going with HBase is a good approach for now.  Thanks for laying
out the pros and cons.

On Fri, Feb 9, 2018 at 3:46 PM, Ryan Merriman <merrim...@gmail.com> wrote:

> I would like to bring this discussion to a conclusion and update the PR
> accordingly.  To clarify on whether we depend on an RDBMS right now, we do
> but only for authentication which will probably be replaced at some point.
> So the answer is not really.  I personally agree with Simon and think we
> should use HBase because this use case fits the data model and it's already
> in our stack.  I would add that with HBase we can move the schema evolution
> complexity to the application layer and hide it from the user.  This will
> make upgrades easier which is my main point of contention.  I also agree
> with Nick in that I do think there may be a place for a RDBMS in the future
> but we can always add it back.
>
> The 2 choices seems to be either an RDBMS or HBase.  Here is a summary
> based on comments in this discussion:
>
> RDBMS
> - some are not too worried about schema evolution as the data model will
> likely be simple
> - avoiding having to alter tables when upgrading would be ideal
> - works with ORM tools
> - is flexible and could be useful for future use cases
>
> HBase
> - might involve boilerplate code if not covered elsewhere in Metron
> - key/value is good enough for user profile settings
> - data replication for free
>
> Reading over this thread again I get the impression there is a slight
> preference for HBase.  Want to give people one more change to chime in or
> argue the other solution.  Let me know if I missed anything or didn't
> include someone's argument.
>
>
>
> On Fri, Feb 2, 2018 at 8:24 AM, Nick Allen <n...@nickallen.org> wrote:
>
> > > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> > would never suggest introducing HBase for something like this, but since
> > it’s there.
> >
> > Ah, gotcha.  Misunderstood your statement.
> >
> >
> >
> > On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball <
> > si...@simonellistonball.com> wrote:
> >
> > > Glad you agree with me that this isn’t HBase scale… it’s clearly not. I
> > > would never suggest introducing HBase for something like this, but
> since
> > > it’s there.
> > >
> > > On the idea of using the Ambari RDBMS for the same basis of it being
> > > there, I see your point. That said, it can be postgres, sql server,
> > mysql,
> > > maria, oracle… various. Yes we have an ORM, but those are not nearly as
> > > magic as they claim, and upgrade / schema evolution of an RDBMS often
> > > involves some sort of platform dependent SQL migration in my
> experience.
> > I
> > > would suggest that supporting that range of options is not a good idea
> > for
> > > us. The Ambari project also pretty much reserve the right to blow away
> > that
> > > infrastructure in upgrades (which is fair enough). So relying on there
> > > being an RDBMS owned by another component is not something I would
> > > necessarily say was a clean choice.
> > >
> > > Simon
> > >
> > > > On 2 Feb 2018, at 13:50, Nick Allen <n...@nickallen.org> wrote:
> > > >
> > > > I fall marginally on the side of an RDBMS.  There is definitely a
> case
> > to
> > > > be made on both sides, but I'll point out a few things for the RDBMS.
> > > >
> > > >
> > > > (1) Flexibility.  Using an RDBMS is going to provide us with much
> > greater
> > > > flexibility going forward.  We really don't know what the specific
> use
> > > > cases will be, but I am willing to bet they are user-focused
> > > (preferences,
> > > > etc).  The type of use cases that most web applications use an RDBMS
> > for.
> > > >
> > > >
> > > >> If anything I would like to see the current RDBMS dependency come
> > out...
> > > >
> > > > (2) Don't we already have an RDBMS requirement for Ambari?  That's a
> > > > dependency that we do not control.
> > > >
> > > >
> > > >> ... hbase seems a good option (because we already have it there, it
> > > would
> > > > be kinda crazy at this scale if we didn’t already have it)
> > > >
> > > > (3) In this scenario, the RDBMS would not scale proportionally with
> the
> > > > amount of telemetry, it would scale based on usage; primarily the
> > number
> > > of
> > > > users.  This is not "big data" scale.  I don't think we can make the
> > case
> > > > for HBase based on scale here.
> > > >
> > > >
> > > >> We would also end up with, as Mike points out, a whole new disk
> > > > deployment patterns and a bunch of additional DBA ops process
> > > requirements
> > > > for every install.
> > > >
> > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > > > enterprises and organizations that are already very familiar with
> RDBMS
> > > > solutions and have the infrastructure in place to manage those.  For
> > > users
> > > > that don't need HA/DR, just use the DB that gets spun-up with Ambari.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > > > si...@simonellistonball.com> wrote:
> > > >
> > > >> Introducing a RDBMS to the stack seems unnecessary for this.
> > > >>
> > > >> If we consider the data access patterns for user profiles, we are
> > > unlikely
> > > >> to query into them, or indeed do anything other than look them up,
> or
> > > write
> > > >> them out by a username key. To that end, using an ORM to translate
> a a
> > > >> nested config object into a load of tables seems to introduce
> > complexity
> > > >> and brittleness we then have to take away through relying on
> > relational
> > > >> consistency models. We would also end up with, as Mike points out, a
> > > whole
> > > >> new disk deployment patterns and a bunch of additional DBA ops
> process
> > > >> requirements for every install.
> > > >>
> > > >> Since the access pattern is almost entirely key => value, hbase
> seems
> > a
> > > >> good option (because we already have it there, it would be kinda
> crazy
> > > at
> > > >> this scale if we didn’t already have it) or arguably zookeeper, but
> > that
> > > >> might be at the other end of the scale argument. I’d even go as far
> as
> > > to
> > > >> suggest files on HDFS to keep it simple.
> > > >>
> > > >> Simon
> > > >>
> > > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> > > michael.miklav...@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Personally, I'd be in favor of something like Maria DB as an open
> > > source
> > > >>> repo. Or any other ansi sql store. On the positive side, it should
> > mesh
> > > >>> seamlessly with ORM tools. And the schema for this should be pretty
> > > >>> vanilla, I'd imagine. I might even consider skipping ORM for
> straight
> > > >> JDBC
> > > >>> and simple command scripts in Java for something this small. I'm
> not
> > > >>> worried so much about migrations of this sort. Large scale DBs can
> > get
> > > >>> involved with major schema changes, but thats usually when the
> > > datastore
> > > >> is
> > > >>> a massive set of tables with complex relationships, at least in my
> > > >>> experience.
> > > >>>
> > > >>> We could also use hbase, which probably wouldn't be that hard
> either,
> > > but
> > > >>> there may be more boilerplate to write for the client as compared
> to
> > > >>> standard SQL. But I'm assuming we could reuse a fair amount of
> > existing
> > > >>> code from our enrichments. One additional reason in favor of hbase
> > > might
> > > >> be
> > > >>> data replication. For a SQL instance we'd probably recommend a RAID
> > > store
> > > >>> or backup procedure, but we get that pretty easy with hbase too.
> > > >>>
> > > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ceste...@gmail.com> wrote:
> > > >>>
> > > >>>> So, I'll answer your question with some questions:
> > > >>>>
> > > >>>>  - No matter the data store we use upgrading will take some care,
> > > >> right?
> > > >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say that
> > we
> > > >> do
> > > >>>>  in the REST layer already, right?
> > > >>>>  - If we don't use a RDBMs, what's the other option?  What are the
> > > pros
> > > >>>>  and cons?
> > > >>>>  - Have we considered non-server offline persistent solutions
> (e.g.
> > > >>>>  https://www.html5rocks.com/en/features/storage)?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <
> merrim...@gmail.com>
> > > >> wrote:
> > > >>>>
> > > >>>>> There is currently a PR up for review that allows a user to
> > configure
> > > >> and
> > > >>>>> save the list of facet fields that appear in the left column of
> the
> > > >>>> Alerts
> > > >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST layer
> > has
> > > >> ORM
> > > >>>>> support which means we can store those in a relational database.
> > > >>>>>
> > > >>>>> However I'm not 100% sure this is the best place to keep this.
> As
> > we
> > > >> add
> > > >>>>> more use cases like this the backing tables in the RDBMS will
> need
> > to
> > > >> be
> > > >>>>> managed.  This could make upgrading more tedious and error-prone.
> > Is
> > > >>>> there
> > > >>>>> are a better way to store this, assuming we can leverage a
> > component
> > > >>>> that's
> > > >>>>> already included in our stack?
> > > >>>>>
> > > >>>>> Ryan
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>

Reply via email to