Jon,

Thank you for your thoughts; they are appreciated and you should keep them
coming.  This kind of discussion is exactly why I sent out this thread.  I
think it's safe to say that the entire community shares your desire for
Metron to be as easy to use as possible and a "data analysis platform for
the masses."  We should hold ourselves to a high standard, no doubt.

Casey

On Fri, Nov 4, 2016 at 6:30 AM, [email protected] <[email protected]> wrote:

> Please understand that my points mostly relate to perception and ease of
> use, not what's technically possible or available.  I'm coming at this as
> Metron should be a data analysis platform for the masses.
>
> METRON-517/542 - While I'm willing to let this one go it depends on your
> definition of non-issue.  I personally believe that data (in every location
> that it exists) needs to be obvious and have ultra high integrity.  I'm not
> concerned that the correct data won't exist somewhere in the cluster, I'm
> focusing on it being easily accessible by an operations team that may
> consist of entry level analysts.  Once 517 is done and merged I would
> consider that a short term mitigation is in place.
>
> I feel like the project should stick to certain principles and a suggestion
> is that data access is easy, accurate, and obvious. Do we have anything
> like this that was agreed upon, discussed, or documented? Probably a
> discussion for a different thread.
>
> METRON-485/470/etc. were mostly to illustrate a consistency issue that and
> resolving them would give a better first impression (assuming that people
> monitoring the project will start using it more once it's non-BETA
> software).  First impressions are big on my book and could affect initial
> adoption.
>
> Regarding 485 - Otto may be able to clarify but I thought somebody else saw
> this issue as well.  I think the finger is currently being pointed at monit
> timeouts and not storm.  It also doesn't happen every single time, I only
> run into it while the cluster is under load and after dozens of topology
> restarts that I do when tuning parallelism in storm.  I'm going to be
> updating to storm 1.0.x in order to see if this still exists.  Again, this
> relates to ease of use/load testing/tuning.
>
> Agree with the upgrade comments - as long as it's supported at some defined
> point (IMHO this is when a project leaves BETA but others are welcome to
> disagree).
>
> Finally, I know this doesn't come across well in email but I'm just
> mentioning items which I think are important, not attempting to demand that
> they be fixed or that this doesn't leave beta.  Thanks,
>
> Jon
>
> On Thu, Nov 3, 2016, 16:44 James Sirota <[email protected]> wrote:
>
>
> Hi Jon,
>
> Here are my thoughts around your objections.
>
> METRON-517/METRON-542
>
> I thin the mechanism currently exists within Metron to make this a
> non-issue.  I believe you can solve it with a combination of a Stellar
> statement and ES templates.  As you mentioned, we can truncate the string
> and then include the relevant meta data in the message (original length,
> hash, etc).  Cramming really long strings into ES is generally a bad thing,
> which is why this limitation exists.   The metadata in the indexed message
> along with the timestamp allows you to pull data from HDFS should you need
> to recover the full string.
>
> METRON-485
>
> We cannot replicate this issue in our environment, but if this is indeed an
> issue this is an issue with Storm.  A Jira should be filed against Storm
> and not against Metron.  My hunch, though, is that it's probably something
> in your environment.  I just tried stopping all topologies on my AWS
> cluster and then went to all Storm nodes and didn't see any workers left
> behind.
>
> METRON-470
>
> I think this is mainly a consistency issue.  I don't think this impacts the
> stability or function of the software.  I think this is a nice to have,
> maybe in the next few releases, but I don't think we absolutely have to
> have this to drop BETA
>
> With respect to upgrades, here are my thoughts.  There is really no way to
> upgrade Metron 0.2.1 to Metron 0.2.2 in place because it requires a change
> of HDP.  The new build will only be compatible with HDP 2.5 and not 2.4.
> So you have to lay down a new cluster regardless.  We can document how to
> get the configs off of your old Metron and plug them into your new Metron
> so that it works the same.  That shouldn't be a problem.
>
> Our upgrade path for future releases will revolve around the Ambari Metron
> management pack that is available with the upcoming build.  Right now the
> install capability is available and the upgrade capability will come in
> incrementally within the next few release.  We will additionally deprecate
> Monit and switch that functionality to Ambari as well.  Finally, we will
> also use Ambari for metrics monitoring.  There is lots to do so we will
> triage and prioritize Jiras as a community to see which parts we want to
> tackle first.  This is why your participation in the community is so
> valuable.
>
> Thanks,
> James
>
>
>
> 03.11.2016, 11:07, "[email protected]" <[email protected]>:
> > I agree that we can split METRON-517 into a short term and long term fix.
> > I have attempted to organize my thoughts regarding the long term fix into
> > METRON-542 and can get a PR out for METRON-517 soon to close that out.
> >
> > This leaves cluster tuning and a valid upgrade path for users, the latter
> of
> > which is my predominant concern. If the team is willing to say that
> > starting with 0.2.2 there will be a valid upgrade path to future releases
> I
> > think that removing the BETA tag at 0.2.2 is reasonable. That said, this
> > is just following my perception of what the BETA tag represents.
> >
> > Jon
> >
> > On Thu, Nov 3, 2016 at 11:50 AM Casey Stella <[email protected]> wrote:
> >
> >>  Ok, regarding METRON-517, I've thought about this a bit having read
> your
> >>  really great and detailed JIRA as well as the discussion around this on
> the
> >>  dev list between you and Matt Foley. I want to separate the discussion
> >>  between what is the correct long-term solution for this issue versus
> what
> >>  is an acceptable solution.
> >>
> >>  In terms of an acceptable work-around, my opinion is that because we
> allow
> >>  the user to modify the ES template they can
> >>
> >>     - Adjust the template to specify ignore_above
> >>     <
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/
> current/ignore-above.html
> >>  >
> >>  on
> >>     fields which they feel are likely to be large (maybe every string
> field)
> >>     - The combination of timestamp and ip_src_addr should be sufficient
> for
> >>     picking out the raw data in question from the HDFS store
> >>     - A stellar enrichment can be used to tag the messages with large
> URIs
> >>     and that can factor into the threat triage even or be used to filter
> in
> >>     kibana
> >>     - As you say, you can use the profiler to track counts of such
> messages
> >>     if you so desire and factor that into threat alerting or filtering
> in
> >>     kibana.
> >>
> >>  Ultimately, I believe we have exposed the appropriate set of tooling to
> >>  provide an acceptable solution for the moment. Now, as for the best
> >>  long-term solution, I will let the good discussion on the mailing list
> and
> >>  JIRA continue and contribute my thoughts on the JIRA
> >>  <https://issues.apache.org/jira/browse/METRON-517>.
> >>
> >>  Of course, this is just $0.02 :)
> >>
> >>  Apologies to Dave, I wanted to mark this aspect of the discussion on
> this
> >>  thread as it is relevant to sufficient criteria to remove the BETA tag.
> >>
> >>  Best,
> >>
> >>  Casey
> >>
> >>  On Thu, Nov 3, 2016 at 7:26 AM, [email protected] <[email protected]>
> wrote:
> >>
> >>  > To clarify, it only needs to truncate fields > 32766 which need a
> >>  > full/exact string match search to be run on them (analyzed fields
> >>  generally
> >>  > would not hit this limitation but I guess in theory they could).
> >>  However,
> >>  > that's probably every field which can get > 32766 because I'm
> assuming
> >>  > those will all be strings.
> >>  >
> >>  > I also think using the profiler to monitor the truncation action
> could
> >>  be a
> >>  > useful default.
> >>  >
> >>  > Jon
> >>  >
> >>  > On Wed, Nov 2, 2016, 21:08 [email protected] <[email protected]>
> wrote:
> >>  >
> >>  > > That would break searching on uri entirely unless you queried and
> knew
> >>  to
> >>  > > truncate at 32766 because it's not analyzed. I don't like pushing
> that
> >>  > > complication to the end user.
> >>  > >
> >>  > > I would suggest truncation in the indexingBolt (not using stellar
> >>  because
> >>  > > you'd want this across the board) for all fields > 32766 (how do we
> >>  make
> >>  > > sure this gets updated if the limitation changes in Lucene?) and
> adding
> >>  > > metadata key-value pairs (pre-trunc length, hash, truncated bool,
> >>  etc.).
> >>  > > In the URI scenario I would also suggest doing a multifield mapping
> by
> >>  > > default because of the way that data is useful (not sure which
> analyser
> >>  > to
> >>  > > use though - maybe write or find a good URI analyzer?). Since
> >>  timestamp
> >>  > is
> >>  > > a required field for all messages (I'm pretty sure?) I'm ok with
> >>  > timestamp
> >>  > > and field value used as the UID, but would prefer something better.
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > > On Wed, Nov 2, 2016, 20:33 James Sirota <[email protected]>
> wrote:
> >>  > >
> >>  > > Jon,
> >>  > >
> >>  > > For METRON-517 would it suffice to have a stellar statement to take
> a
> >>  URI
> >>  > > string and truncate it to length of 32766 in the ES writer? But
> still
> >>  > > write the actual string to HDFS? You can then search against ES on
> the
> >>  > > truncated portion, but retrieve the actual timestamp from HDFS.
> It's
> >>  > easy
> >>  > > to do because you know the timestamp from the original message. So
> you
> >>  > > know which logs in HDFS to search through to find the data.
> >>  > >
> >>  > > 02.11.2016, 14:12, "[email protected]" <[email protected]>:
> >>  > > > I personally would like to see the following things done before
> >>  things
> >>  > > > leave BETA:
> >>  > > > (1) Address data integrity concerns (Specifically thinking of
> >>  > METRON-370,
> >>  > > > METRON-517)
> >>  > > > (2) Make cluster tuning easier and more consistent (METRON-485,
> >>  > > METRON-470,
> >>  > > > and the "[DISCUSS] moving parsers back to flux" which I can't
> find a
> >>  > JIRA
> >>  > > > for).
> >>  > > >
> >>  > > > I would also want to see the upgrade path (as opposed to rebuild)
> be
> >>  > more
> >>  > > > thoroughly and regularly tested once things leave BETA. From my
> >>  > > > perspective I think the project is very close but not yet ready.
> >>  > > >
> >>  > > > Jon
> >>  > > >
> >>  > > > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella <[email protected]>
> >>  > wrote:
> >>  > > >
> >>  > > > Hello Everyone,
> >>  > > >
> >>  > > > Now that the discussion around the next release has started, it
> has
> >>  > been
> >>  > > > proposed and I think it's a good time to discuss what to name
> this
> >>  next
> >>  > > > release. Before, we have adopted the BETA suffix. I think it
> might be
> >>  > > > time to drop it and call the next release 0.2.2
> >>  > > >
> >>  > > > Thoughts?
> >>  > > >
> >>  > > > Best,
> >>  > > >
> >>  > > > Casey
> >>  > > >
> >>  > > > --
> >>  > > >
> >>  > > > Jon
> >>  > >
> >>  > > -------------------
> >>  > > Thank you,
> >>  > >
> >>  > > James Sirota
> >>  > > PPMC- Apache Metron (Incubating)
> >>  > > jsirota AT apache DOT org
> >>  > >
> >>  > > --
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > --
> >>  >
> >>  > Jon
> >>  >
> > --
> >
> > Jon
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
> --
>
> Jon
>

Reply via email to