Hi Simon,

I think it is a hard trade-off. Even right now without any ability to
customise separator/Metron internal field names, Metron users need to put a
mapping in place at the integration layer (At least this is what we are
doing :) ). Every organisation/user may need to follow different policies
for different reasons, not to mention any certain technology limitations
(e.g. hive). The question is, do we think Elasticsearch/Solr and HDFS (As
data storage) are coupled with Metron or not. Metron components can freely
use metron specific data model, but when it comes to the data model at
rest, it would be better to decouple it from Metron data model to make it
more flexible for the integration with other tools, so it means whenever
data model is related to rest, a mapping layer would be required.
Certainly, it doesn't mean every Metron user should provide a mapping. We
can, but it doesn't mean we have to. It becomes just more flexible for the
integration to be able to have a consistent data model across integration
endpoints (Elasticsearch/Solr and HDFS). The problem we are facing is in
addition to a separate mapping for Elasticsearch, we have to put a
different mapping for ORC as well. At least if it was consistent across
Elasticsearch and HDFS, we could only have a single mapping for an
application that consumes from both. Therefore, if we exclude the data
model in transit, A mapping at Metron-rest (to serve Alert UI) and a
mapping at Metron-indexing (ES/Solr and HDFS) would be sufficient. Even
right now by changing the separator at the index time we are doing the same
thing. We are not changing the data model in transit.

Cheers,
Ali



On Tue, Aug 14, 2018 at 9:11 PM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> The challenge with making it configurable is that every query, every
> profile, every analytic, template, pre-installed dashboard and use case
> built by any third party who wanted to extend metron would have to honour
> the configuration and paramaterize every query they run. My worry is that
> that would render some engines totally incompatible with many installs (as
> opposed to just needing an escape character as you would with hive now) and
> would prevent a lot of tools participating in the metron eco-system.
>
> I think this is something where we need to make a good decision and stick
> to it to allow the ecosystem to build on a known foundation.
>
> Dots are not great because hive uses them to separate, underscore collides
> with our existing  convention, and hyphen collides with a number of other
> common log formats, so it’s not an easy one to have an opinion on, but I do
> think we should have an opinion rather than forcing every user to make the
> hard choice to exclude others from sharing.
>
> Perhaps the flat key value structure is the real question here, and given
> progress in the underlying index engines may not be the panacea it once was.
>
> Simon
>
> Sent from my iPhone
>
> > On 14 Aug 2018, at 11:42, deepak kumar <kdq...@gmail.com> wrote:
> >
> > I agree Ali.
> > May be it can be configuration parameter.
> >
> >> On Tue, Aug 14, 2018 at 3:e t24 PM Ali Nazemian <alinazem...@gmail.com>
> wrote:
> >>
> >> Hi Simon,
> >>
> >> We have temporarily decided to just change it with "_" for HDFS to avoid
> >> all the headaches of the bugs and issues that can be raised by using
> >> unsupported separators for ORC/Hive and Spark. However, I am not quite
> >> confident with "_" as an option for the community as it becomes similar
> to
> >> normal Metron separator. Maybe it would be nice to have an ability to
> >> change the separator to any other character and let users decide what
> they
> >> want to use.
> >>
> >> Cheers,
> >> Ali
> >>
> >> On Tue, Aug 14, 2018 at 12:14 AM Simon Elliston Ball <
> >> si...@simonellistonball.com> wrote:
> >>
> >>> Do you have any suggestions for what would make sense as a delimiter?
> >>>
> >>>> On 9 August 2018 at 05:57, Ali Nazemian <alinazem...@gmail.com>
> wrote:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> I was wondering if we can change the field separators in Metron to be
> >>> able
> >>>> to make it Hive/ORC friendly. I could find the following PR, but
> >> neither
> >>>> dot nor colon is very Hive and ORC friendly and they will cause some
> >>>> issues. Hence, I wanted to see if it is possible to change the field
> >>>> separator to something else or even give users an ability to define
> >> what
> >>>> separator to be used to make the data model consistent across
> >>> Elasticsearch
> >>>> and HDFS.
> >>>>
> >>>> https://github.com/apache/metron/pull/1022
> >>>>
> >>>> Cheers,
> >>>> Ali
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> simon elliston ball
> >>> @sireb
> >>>
> >>
> >>
> >> --
> >> A.Nazemian
> >>
>


-- 
A.Nazemian

Reply via email to