I'm also good with HBase.

On Fri, Feb 9, 2018 at 2:14 PM, Nick Allen <n...@nickallen.org> wrote:

> +1 I think going with HBase is a good approach for now.  Thanks for laying
> out the pros and cons.
>
> On Fri, Feb 9, 2018 at 3:46 PM, Ryan Merriman <merrim...@gmail.com> wrote:
>
> > I would like to bring this discussion to a conclusion and update the PR
> > accordingly.  To clarify on whether we depend on an RDBMS right now, we
> do
> > but only for authentication which will probably be replaced at some
> point.
> > So the answer is not really.  I personally agree with Simon and think we
> > should use HBase because this use case fits the data model and it's
> already
> > in our stack.  I would add that with HBase we can move the schema
> evolution
> > complexity to the application layer and hide it from the user.  This will
> > make upgrades easier which is my main point of contention.  I also agree
> > with Nick in that I do think there may be a place for a RDBMS in the
> future
> > but we can always add it back.
> >
> > The 2 choices seems to be either an RDBMS or HBase.  Here is a summary
> > based on comments in this discussion:
> >
> > RDBMS
> > - some are not too worried about schema evolution as the data model will
> > likely be simple
> > - avoiding having to alter tables when upgrading would be ideal
> > - works with ORM tools
> > - is flexible and could be useful for future use cases
> >
> > HBase
> > - might involve boilerplate code if not covered elsewhere in Metron
> > - key/value is good enough for user profile settings
> > - data replication for free
> >
> > Reading over this thread again I get the impression there is a slight
> > preference for HBase.  Want to give people one more change to chime in or
> > argue the other solution.  Let me know if I missed anything or didn't
> > include someone's argument.
> >
> >
> >
> > On Fri, Feb 2, 2018 at 8:24 AM, Nick Allen <n...@nickallen.org> wrote:
> >
> > > > Glad you agree with me that this isn’t HBase scale… it’s clearly
> not. I
> > > would never suggest introducing HBase for something like this, but
> since
> > > it’s there.
> > >
> > > Ah, gotcha.  Misunderstood your statement.
> > >
> > >
> > >
> > > On Fri, Feb 2, 2018 at 9:01 AM Simon Elliston Ball <
> > > si...@simonellistonball.com> wrote:
> > >
> > > > Glad you agree with me that this isn’t HBase scale… it’s clearly
> not. I
> > > > would never suggest introducing HBase for something like this, but
> > since
> > > > it’s there.
> > > >
> > > > On the idea of using the Ambari RDBMS for the same basis of it being
> > > > there, I see your point. That said, it can be postgres, sql server,
> > > mysql,
> > > > maria, oracle… various. Yes we have an ORM, but those are not nearly
> as
> > > > magic as they claim, and upgrade / schema evolution of an RDBMS often
> > > > involves some sort of platform dependent SQL migration in my
> > experience.
> > > I
> > > > would suggest that supporting that range of options is not a good
> idea
> > > for
> > > > us. The Ambari project also pretty much reserve the right to blow
> away
> > > that
> > > > infrastructure in upgrades (which is fair enough). So relying on
> there
> > > > being an RDBMS owned by another component is not something I would
> > > > necessarily say was a clean choice.
> > > >
> > > > Simon
> > > >
> > > > > On 2 Feb 2018, at 13:50, Nick Allen <n...@nickallen.org> wrote:
> > > > >
> > > > > I fall marginally on the side of an RDBMS.  There is definitely a
> > case
> > > to
> > > > > be made on both sides, but I'll point out a few things for the
> RDBMS.
> > > > >
> > > > >
> > > > > (1) Flexibility.  Using an RDBMS is going to provide us with much
> > > greater
> > > > > flexibility going forward.  We really don't know what the specific
> > use
> > > > > cases will be, but I am willing to bet they are user-focused
> > > > (preferences,
> > > > > etc).  The type of use cases that most web applications use an
> RDBMS
> > > for.
> > > > >
> > > > >
> > > > >> If anything I would like to see the current RDBMS dependency come
> > > out...
> > > > >
> > > > > (2) Don't we already have an RDBMS requirement for Ambari?  That's
> a
> > > > > dependency that we do not control.
> > > > >
> > > > >
> > > > >> ... hbase seems a good option (because we already have it there,
> it
> > > > would
> > > > > be kinda crazy at this scale if we didn’t already have it)
> > > > >
> > > > > (3) In this scenario, the RDBMS would not scale proportionally with
> > the
> > > > > amount of telemetry, it would scale based on usage; primarily the
> > > number
> > > > of
> > > > > users.  This is not "big data" scale.  I don't think we can make
> the
> > > case
> > > > > for HBase based on scale here.
> > > > >
> > > > >
> > > > >> We would also end up with, as Mike points out, a whole new disk
> > > > > deployment patterns and a bunch of additional DBA ops process
> > > > requirements
> > > > > for every install.
> > > > >
> > > > > (4) Most users that need HA/DR (and other 'advanced stuff'), are
> > > > > enterprises and organizations that are already very familiar with
> > RDBMS
> > > > > solutions and have the infrastructure in place to manage those.
> For
> > > > users
> > > > > that don't need HA/DR, just use the DB that gets spun-up with
> Ambari.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 2, 2018 at 7:17 AM Simon Elliston Ball <
> > > > > si...@simonellistonball.com> wrote:
> > > > >
> > > > >> Introducing a RDBMS to the stack seems unnecessary for this.
> > > > >>
> > > > >> If we consider the data access patterns for user profiles, we are
> > > > unlikely
> > > > >> to query into them, or indeed do anything other than look them up,
> > or
> > > > write
> > > > >> them out by a username key. To that end, using an ORM to translate
> > a a
> > > > >> nested config object into a load of tables seems to introduce
> > > complexity
> > > > >> and brittleness we then have to take away through relying on
> > > relational
> > > > >> consistency models. We would also end up with, as Mike points
> out, a
> > > > whole
> > > > >> new disk deployment patterns and a bunch of additional DBA ops
> > process
> > > > >> requirements for every install.
> > > > >>
> > > > >> Since the access pattern is almost entirely key => value, hbase
> > seems
> > > a
> > > > >> good option (because we already have it there, it would be kinda
> > crazy
> > > > at
> > > > >> this scale if we didn’t already have it) or arguably zookeeper,
> but
> > > that
> > > > >> might be at the other end of the scale argument. I’d even go as
> far
> > as
> > > > to
> > > > >> suggest files on HDFS to keep it simple.
> > > > >>
> > > > >> Simon
> > > > >>
> > > > >>> On 1 Feb 2018, at 23:24, Michael Miklavcic <
> > > > michael.miklav...@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>> Personally, I'd be in favor of something like Maria DB as an open
> > > > source
> > > > >>> repo. Or any other ansi sql store. On the positive side, it
> should
> > > mesh
> > > > >>> seamlessly with ORM tools. And the schema for this should be
> pretty
> > > > >>> vanilla, I'd imagine. I might even consider skipping ORM for
> > straight
> > > > >> JDBC
> > > > >>> and simple command scripts in Java for something this small. I'm
> > not
> > > > >>> worried so much about migrations of this sort. Large scale DBs
> can
> > > get
> > > > >>> involved with major schema changes, but thats usually when the
> > > > datastore
> > > > >> is
> > > > >>> a massive set of tables with complex relationships, at least in
> my
> > > > >>> experience.
> > > > >>>
> > > > >>> We could also use hbase, which probably wouldn't be that hard
> > either,
> > > > but
> > > > >>> there may be more boilerplate to write for the client as compared
> > to
> > > > >>> standard SQL. But I'm assuming we could reuse a fair amount of
> > > existing
> > > > >>> code from our enrichments. One additional reason in favor of
> hbase
> > > > might
> > > > >> be
> > > > >>> data replication. For a SQL instance we'd probably recommend a
> RAID
> > > > store
> > > > >>> or backup procedure, but we get that pretty easy with hbase too.
> > > > >>>
> > > > >>> On Feb 1, 2018 2:45 PM, "Casey Stella" <ceste...@gmail.com>
> wrote:
> > > > >>>
> > > > >>>> So, I'll answer your question with some questions:
> > > > >>>>
> > > > >>>>  - No matter the data store we use upgrading will take some
> care,
> > > > >> right?
> > > > >>>>  - Do we currently depend on a RDBMS anywhere?  I want to say
> that
> > > we
> > > > >> do
> > > > >>>>  in the REST layer already, right?
> > > > >>>>  - If we don't use a RDBMs, what's the other option?  What are
> the
> > > > pros
> > > > >>>>  and cons?
> > > > >>>>  - Have we considered non-server offline persistent solutions
> > (e.g.
> > > > >>>>  https://www.html5rocks.com/en/features/storage)?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, Feb 1, 2018 at 9:11 AM, Ryan Merriman <
> > merrim...@gmail.com>
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> There is currently a PR up for review that allows a user to
> > > configure
> > > > >> and
> > > > >>>>> save the list of facet fields that appear in the left column of
> > the
> > > > >>>> Alerts
> > > > >>>>> UI:  https://github.com/apache/metron/pull/853.  The REST
> layer
> > > has
> > > > >> ORM
> > > > >>>>> support which means we can store those in a relational
> database.
> > > > >>>>>
> > > > >>>>> However I'm not 100% sure this is the best place to keep this.
> > As
> > > we
> > > > >> add
> > > > >>>>> more use cases like this the backing tables in the RDBMS will
> > need
> > > to
> > > > >> be
> > > > >>>>> managed.  This could make upgrading more tedious and
> error-prone.
> > > Is
> > > > >>>> there
> > > > >>>>> are a better way to store this, assuming we can leverage a
> > > component
> > > > >>>> that's
> > > > >>>>> already included in our stack?
> > > > >>>>>
> > > > >>>>> Ryan
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to