I'm a little late to the party but thought I would go ahead and throw my
two cents into the mix.

I share the concern around an upgrade / migration path. While I would love
to see the BETA dropped sooner than later, to me, this is a game changer
for people implementing Metron. I think there is a silent expectation of no
data loss after dropping the BETA tag.

Even if there is not a direct upgrade path for a few releases, is there
documentation that we could provide to ensure a data migration path for
users? I'm not thinking anything automated just some instructions on what
to do.

-Kyle

On Fri, Nov 4, 2016 at 9:16 AM, Casey Stella <[email protected]> wrote:

> Jon,
>
> Thank you for your thoughts; they are appreciated and you should keep them
> coming.  This kind of discussion is exactly why I sent out this thread.  I
> think it's safe to say that the entire community shares your desire for
> Metron to be as easy to use as possible and a "data analysis platform for
> the masses."  We should hold ourselves to a high standard, no doubt.
>
> Casey
>
> On Fri, Nov 4, 2016 at 6:30 AM, [email protected] <[email protected]> wrote:
>
> > Please understand that my points mostly relate to perception and ease of
> > use, not what's technically possible or available.  I'm coming at this as
> > Metron should be a data analysis platform for the masses.
> >
> > METRON-517/542 - While I'm willing to let this one go it depends on your
> > definition of non-issue.  I personally believe that data (in every
> location
> > that it exists) needs to be obvious and have ultra high integrity.  I'm
> not
> > concerned that the correct data won't exist somewhere in the cluster, I'm
> > focusing on it being easily accessible by an operations team that may
> > consist of entry level analysts.  Once 517 is done and merged I would
> > consider that a short term mitigation is in place.
> >
> > I feel like the project should stick to certain principles and a
> suggestion
> > is that data access is easy, accurate, and obvious. Do we have anything
> > like this that was agreed upon, discussed, or documented? Probably a
> > discussion for a different thread.
> >
> > METRON-485/470/etc. were mostly to illustrate a consistency issue that
> and
> > resolving them would give a better first impression (assuming that people
> > monitoring the project will start using it more once it's non-BETA
> > software).  First impressions are big on my book and could affect initial
> > adoption.
> >
> > Regarding 485 - Otto may be able to clarify but I thought somebody else
> saw
> > this issue as well.  I think the finger is currently being pointed at
> monit
> > timeouts and not storm.  It also doesn't happen every single time, I only
> > run into it while the cluster is under load and after dozens of topology
> > restarts that I do when tuning parallelism in storm.  I'm going to be
> > updating to storm 1.0.x in order to see if this still exists.  Again,
> this
> > relates to ease of use/load testing/tuning.
> >
> > Agree with the upgrade comments - as long as it's supported at some
> defined
> > point (IMHO this is when a project leaves BETA but others are welcome to
> > disagree).
> >
> > Finally, I know this doesn't come across well in email but I'm just
> > mentioning items which I think are important, not attempting to demand
> that
> > they be fixed or that this doesn't leave beta.  Thanks,
> >
> > Jon
> >
> > On Thu, Nov 3, 2016, 16:44 James Sirota <[email protected]> wrote:
> >
> >
> > Hi Jon,
> >
> > Here are my thoughts around your objections.
> >
> > METRON-517/METRON-542
> >
> > I thin the mechanism currently exists within Metron to make this a
> > non-issue.  I believe you can solve it with a combination of a Stellar
> > statement and ES templates.  As you mentioned, we can truncate the string
> > and then include the relevant meta data in the message (original length,
> > hash, etc).  Cramming really long strings into ES is generally a bad
> thing,
> > which is why this limitation exists.   The metadata in the indexed
> message
> > along with the timestamp allows you to pull data from HDFS should you
> need
> > to recover the full string.
> >
> > METRON-485
> >
> > We cannot replicate this issue in our environment, but if this is indeed
> an
> > issue this is an issue with Storm.  A Jira should be filed against Storm
> > and not against Metron.  My hunch, though, is that it's probably
> something
> > in your environment.  I just tried stopping all topologies on my AWS
> > cluster and then went to all Storm nodes and didn't see any workers left
> > behind.
> >
> > METRON-470
> >
> > I think this is mainly a consistency issue.  I don't think this impacts
> the
> > stability or function of the software.  I think this is a nice to have,
> > maybe in the next few releases, but I don't think we absolutely have to
> > have this to drop BETA
> >
> > With respect to upgrades, here are my thoughts.  There is really no way
> to
> > upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a
> change
> > of HDP.  The new build will only be compatible with HDP 2.5 and not 2.4.
> > So you have to lay down a new cluster regardless.  We can document how to
> > get the configs off of your old Metron and plug them into your new Metron
> > so that it works the same.  That shouldn't be a problem.
> >
> > Our upgrade path for future releases will revolve around the Ambari
> Metron
> > management pack that is available with the upcoming build.  Right now the
> > install capability is available and the upgrade capability will come in
> > incrementally within the next few release.  We will additionally
> deprecate
> > Monit and switch that functionality to Ambari as well.  Finally, we will
> > also use Ambari for metrics monitoring.  There is lots to do so we will
> > triage and prioritize Jiras as a community to see which parts we want to
> > tackle first.  This is why your participation in the community is so
> > valuable.
> >
> > Thanks,
> > James
> >
> >
> >
> > 03.11.2016, 11:07, "[email protected]" <[email protected]>:
> > > I agree that we can split METRON-517 into a short term and long term
> fix.
> > > I have attempted to organize my thoughts regarding the long term fix
> into
> > > METRON-542 and can get a PR out for METRON-517 soon to close that out.
> > >
> > > This leaves cluster tuning and a valid upgrade path for users, the
> latter
> > of
> > > which is my predominant concern. If the team is willing to say that
> > > starting with 0.2.2 there will be a valid upgrade path to future
> releases
> > I
> > > think that removing the BETA tag at 0.2.2 is reasonable. That said,
> this
> > > is just following my perception of what the BETA tag represents.
> > >
> > > Jon
> > >
> > > On Thu, Nov 3, 2016 at 11:50 AM Casey Stella <[email protected]>
> wrote:
> > >
> > >>  Ok, regarding METRON-517, I've thought about this a bit having read
> > your
> > >>  really great and detailed JIRA as well as the discussion around this
> on
> > the
> > >>  dev list between you and Matt Foley. I want to separate the
> discussion
> > >>  between what is the correct long-term solution for this issue versus
> > what
> > >>  is an acceptable solution.
> > >>
> > >>  In terms of an acceptable work-around, my opinion is that because we
> > allow
> > >>  the user to modify the ES template they can
> > >>
> > >>     - Adjust the template to specify ignore_above
> > >>     <
> > >>
> > https://www.elastic.co/guide/en/elasticsearch/reference/
> > current/ignore-above.html
> > >>  >
> > >>  on
> > >>     fields which they feel are likely to be large (maybe every string
> > field)
> > >>     - The combination of timestamp and ip_src_addr should be
> sufficient
> > for
> > >>     picking out the raw data in question from the HDFS store
> > >>     - A stellar enrichment can be used to tag the messages with large
> > URIs
> > >>     and that can factor into the threat triage even or be used to
> filter
> > in
> > >>     kibana
> > >>     - As you say, you can use the profiler to track counts of such
> > messages
> > >>     if you so desire and factor that into threat alerting or filtering
> > in
> > >>     kibana.
> > >>
> > >>  Ultimately, I believe we have exposed the appropriate set of tooling
> to
> > >>  provide an acceptable solution for the moment. Now, as for the best
> > >>  long-term solution, I will let the good discussion on the mailing
> list
> > and
> > >>  JIRA continue and contribute my thoughts on the JIRA
> > >>  <https://issues.apache.org/jira/browse/METRON-517>.
> > >>
> > >>  Of course, this is just $0.02 :)
> > >>
> > >>  Apologies to Dave, I wanted to mark this aspect of the discussion on
> > this
> > >>  thread as it is relevant to sufficient criteria to remove the BETA
> tag.
> > >>
> > >>  Best,
> > >>
> > >>  Casey
> > >>
> > >>  On Thu, Nov 3, 2016 at 7:26 AM, [email protected] <[email protected]>
> > wrote:
> > >>
> > >>  > To clarify, it only needs to truncate fields > 32766 which need a
> > >>  > full/exact string match search to be run on them (analyzed fields
> > >>  generally
> > >>  > would not hit this limitation but I guess in theory they could).
> > >>  However,
> > >>  > that's probably every field which can get > 32766 because I'm
> > assuming
> > >>  > those will all be strings.
> > >>  >
> > >>  > I also think using the profiler to monitor the truncation action
> > could
> > >>  be a
> > >>  > useful default.
> > >>  >
> > >>  > Jon
> > >>  >
> > >>  > On Wed, Nov 2, 2016, 21:08 [email protected] <[email protected]>
> > wrote:
> > >>  >
> > >>  > > That would break searching on uri entirely unless you queried and
> > knew
> > >>  to
> > >>  > > truncate at 32766 because it's not analyzed. I don't like pushing
> > that
> > >>  > > complication to the end user.
> > >>  > >
> > >>  > > I would suggest truncation in the indexingBolt (not using stellar
> > >>  because
> > >>  > > you'd want this across the board) for all fields > 32766 (how do
> we
> > >>  make
> > >>  > > sure this gets updated if the limitation changes in Lucene?) and
> > adding
> > >>  > > metadata key-value pairs (pre-trunc length, hash, truncated bool,
> > >>  etc.).
> > >>  > > In the URI scenario I would also suggest doing a multifield
> mapping
> > by
> > >>  > > default because of the way that data is useful (not sure which
> > analyser
> > >>  > to
> > >>  > > use though - maybe write or find a good URI analyzer?). Since
> > >>  timestamp
> > >>  > is
> > >>  > > a required field for all messages (I'm pretty sure?) I'm ok with
> > >>  > timestamp
> > >>  > > and field value used as the UID, but would prefer something
> better.
> > >>  > >
> > >>  > > Jon
> > >>  > >
> > >>  > > On Wed, Nov 2, 2016, 20:33 James Sirota <[email protected]>
> > wrote:
> > >>  > >
> > >>  > > Jon,
> > >>  > >
> > >>  > > For METRON-517 would it suffice to have a stellar statement to
> take
> > a
> > >>  URI
> > >>  > > string and truncate it to length of 32766 in the ES writer? But
> > still
> > >>  > > write the actual string to HDFS? You can then search against ES
> on
> > the
> > >>  > > truncated portion, but retrieve the actual timestamp from HDFS.
> > It's
> > >>  > easy
> > >>  > > to do because you know the timestamp from the original message.
> So
> > you
> > >>  > > know which logs in HDFS to search through to find the data.
> > >>  > >
> > >>  > > 02.11.2016, 14:12, "[email protected]" <[email protected]>:
> > >>  > > > I personally would like to see the following things done before
> > >>  things
> > >>  > > > leave BETA:
> > >>  > > > (1) Address data integrity concerns (Specifically thinking of
> > >>  > METRON-370,
> > >>  > > > METRON-517)
> > >>  > > > (2) Make cluster tuning easier and more consistent (METRON-485,
> > >>  > > METRON-470,
> > >>  > > > and the "[DISCUSS] moving parsers back to flux" which I can't
> > find a
> > >>  > JIRA
> > >>  > > > for).
> > >>  > > >
> > >>  > > > I would also want to see the upgrade path (as opposed to
> rebuild)
> > be
> > >>  > more
> > >>  > > > thoroughly and regularly tested once things leave BETA. From my
> > >>  > > > perspective I think the project is very close but not yet
> ready.
> > >>  > > >
> > >>  > > > Jon
> > >>  > > >
> > >>  > > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella <
> [email protected]>
> > >>  > wrote:
> > >>  > > >
> > >>  > > > Hello Everyone,
> > >>  > > >
> > >>  > > > Now that the discussion around the next release has started, it
> > has
> > >>  > been
> > >>  > > > proposed and I think it's a good time to discuss what to name
> > this
> > >>  next
> > >>  > > > release. Before, we have adopted the BETA suffix. I think it
> > might be
> > >>  > > > time to drop it and call the next release 0.2.2
> > >>  > > >
> > >>  > > > Thoughts?
> > >>  > > >
> > >>  > > > Best,
> > >>  > > >
> > >>  > > > Casey
> > >>  > > >
> > >>  > > > --
> > >>  > > >
> > >>  > > > Jon
> > >>  > >
> > >>  > > -------------------
> > >>  > > Thank you,
> > >>  > >
> > >>  > > James Sirota
> > >>  > > PPMC- Apache Metron (Incubating)
> > >>  > > jsirota AT apache DOT org
> > >>  > >
> > >>  > > --
> > >>  > >
> > >>  > > Jon
> > >>  > >
> > >>  > --
> > >>  >
> > >>  > Jon
> > >>  >
> > > --
> > >
> > > Jon
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
> > --
> >
> > Jon
> >
>

Reply via email to