Re: [DISCUSS] Persisting user data

Simon Elliston Ball Wed, 02 Aug 2017 13:33:14 -0700

Agreed on Postgres. It's a lot easier to work with license-wise in apache 
projects, and has a lot of the capability we need here, especially if we can 
find a sensible ORM. Anyone got any thoughts on what would work there?


Simon 

> On 2 Aug 2017, at 21:21, Matt Foley <[email protected]> wrote:
> 
> Hi Ryan,
> Zookeeper has a default (and seldom changed) max znode size of 1MB, but it is 
> “designed to store data on the order of kilobytes in size.”[1]  And it’s not 
> really intended for frequently-changing data, which is okay here.  But I just 
> included it for completeness, I’m not advocating for its use here.
> 
> I agree with you that the problem, especially because it includes shared 
> config, would fit well in a db.  I’d suggest you consider PostgreSQL rather 
> than MySQL, as postgres is built into Redhat 6 and 7, and Ambari now uses it 
> by default, so an available server might be conveniently at hand in most 
> deployments.  Definitely assume the user will want to use an external db 
> instance, rather than one dedicated to this use.  Conveniently Postgres also 
> has a native REST interface, with the usual authorization options.
> 
> Never mind about Ambari Views for now.  It’s just a way to get GUI dashboards 
> without writing all the infrastructure for it, which as you say is somewhat 
> water under the bridge.
> Cheers,
> --Matt
> 
> [1] https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html 
> 
> 
> 
> On 8/2/17, 12:34 PM, "Ryan Merriman" <[email protected]> wrote:
> 
>    Matt,
> 
>    Thank you for the suggestions.  I forgot to include Zookeeper.  Are there
>    any tradeoffs we should be aware of if we decide to use Zookeeper?  Are
>    there guidelines for how much data can be stored in Zookeeper?
> 
>    To answer your questions:
> 
>    1.  I think both use cases make sense so a combination of shared and
>    personal.
>    2.  I was planning on managing authorization in the REST layer.  For now
>    viewer login auth (which is really REST auth) will suffice but we might
>    consider other methods since authentication is pluggable here.
>    3.  I had not considered Ambari Views since this will support an existing
>    UI.  How would Ambari Views help us here?
> 
>    I will proceed initially with a saved search POC using a relational
>    database unless you think that is a bad idea or there are other better
>    options.  Hopefully an example will further the discussion.
> 
>    Ryan
> 
>>    On Wed, Jul 26, 2017 at 6:31 PM, Matt Foley <[email protected]> wrote:
>> 
>> There’s a couple other places you could put config info (but maybe not
>> saved searches):
>> -  Zookeeper
>> -  metron-alerts-ui/config.xml or config.json  file
>> -  the Ambari database, whichever it happens to be
>> 
>> Questions that influence the decision include:
>> 1. Should there be one configuration shared among users, or strictly
>> per-user config?  Or a combination of shared and personal?
>> 2. What security do you wish to maintain on changing those settings, both
>> shared and personal?  What authentication/authorization scheme will you
>> use?  Is viewer login auth sufficient for this?
>> 3. Will you assume Ambari exists?  Did you consider using Ambari Views as
>> the basis? (https://cwiki.apache.org/confluence/display/AMBARI/Views )
>> 
>> On 7/26/17, 2:54 PM, "Ryan Merriman" <[email protected]> wrote:
>> 
>>    In anticipation of METRON-988 being merged into master, there will be a
>>    need to persist user preferences such as UI layout, saved searches,
>> search
>>    history, etc.  I think where and how we persist this data should be
>>    discussed in order to facilitate a design.  This data won't be large in
>>    scale and may or may not be relational.  The initial features I am
>> aware of
>>    don't require a relational model but I'm sure there will be some that
>> do in
>>    the future.  I'm also assuming this code will live in the REST
>> application
>>    but someone correct me if there is a reason to keep it somewhere else.
>> 
>>    I think it would be preferable to leverage something that is already
>> in our
>>    stack and available as a dependency.  However I would not be against
>> adding
>>    something if it really were the right tool for the job.  Assuming
>> others
>>    agree we should stick with out current stack, I see these options:
>> 
>>       - MySQL (or other relational database)
>>          - good fit for the size of data
>>          - relational capabilities
>>          - an ORM framework will be necessary which will increase our
>>          dependencies and complexity
>>       - HBase
>>          - client setup and code will likely be simpler and less complex
>>          - limited data model
>>       - Elasticsearch
>>          - json is a convenient data model
>>          - we already store user preferences here (Kibana dashboards)
>>          - we have abstracted our search engine interactions in several
>> places
>>          and would have to here too
>> 
>>    Elasticsearch is out for me because we view search engines as
>> pluggable.  I
>>    think HBase would be the easiest to implement and get working but I'm
>>    worried we'll have similar use cases that won't be a good fit for
>> HBase.
>>    In that case we would need to come up with an alternative persistence
>>    solution anyways.  I think MySQL is a good fit long term but I'm
>> concerned
>>    about adding a heavy ORM framework.  Also, we can't use Hibernate
>> because
>>    it is not license friendly.
>> 
>>    Does anyone have any thoughts on these options or other ideas?
>> 
>>    This requirement also brings up another topic that is outside of this
>>    discussion.  Should we reevaluate our authentication strategy?
>> Currently
>>    the REST application uses JDBC for this but if we decide a different
>>    mechanism is better then we no longer need a relational database.  This
>>    might affect our decision to use MySQL for this kind of data
>> persistence.
>> 
>>    Ryan
>> 
>> 
>> 
>> 
> 
> 
>

Re: [DISCUSS] Persisting user data

Reply via email to