That would break searching on uri entirely unless you queried and knew to
truncate at 32766 because it's not analyzed.  I don't like pushing that
complication to the end user.

I would suggest truncation in the indexingBolt (not using stellar because
you'd want this across the board) for all fields > 32766 (how do we make
sure this gets updated if the limitation changes in Lucene?) and adding
metadata key-value pairs (pre-trunc length, hash, truncated bool, etc.).
In the URI scenario I would also suggest doing a multifield mapping by
default because of the way that data is useful (not sure which analyser to
use though - maybe write or find a good URI analyzer?).  Since timestamp is
a required field for all messages (I'm pretty sure?) I'm ok with timestamp
and field value used as the UID, but would prefer something better.

Jon

On Wed, Nov 2, 2016, 20:33 James Sirota <[email protected]> wrote:

> Jon,
>
> For METRON-517 would it suffice to have a stellar statement to take a URI
> string and truncate it to length of 32766 in the ES writer?  But still
> write the actual string to HDFS? You can then search against ES on the
> truncated portion, but retrieve the actual timestamp from HDFS.  It's easy
> to do because you know the timestamp from the original message.  So you
> know which logs in HDFS to search through to find the data.
>
> 02.11.2016, 14:12, "[email protected]" <[email protected]>:
> > I personally would like to see the following things done before things
> > leave BETA:
> > (1) Address data integrity concerns (Specifically thinking of METRON-370,
> > METRON-517)
> > (2) Make cluster tuning easier and more consistent (METRON-485,
> METRON-470,
> > and the "[DISCUSS] moving parsers back to flux" which I can't find a JIRA
> > for).
> >
> > I would also want to see the upgrade path (as opposed to rebuild) be more
> > thoroughly and regularly tested once things leave BETA. From my
> > perspective I think the project is very close but not yet ready.
> >
> > Jon
> >
> > On Wed, Nov 2, 2016 at 4:44 PM Casey Stella <[email protected]> wrote:
> >
> > Hello Everyone,
> >
> > Now that the discussion around the next release has started, it has been
> > proposed and I think it's a good time to discuss what to name this next
> > release. Before, we have adopted the BETA suffix. I think it might be
> > time to drop it and call the next release 0.2.2
> >
> > Thoughts?
> >
> > Best,
> >
> > Casey
> >
> > --
> >
> > Jon
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
-- 

Jon

Reply via email to