Hi Casey,

It is actually pre-parse process, not a post-parse one. These type of
noises affect the position of an attribute for example and give us parsing
exception. The timestamp example was not a good one because that is
actually a post-parse exception.

On Wed, Apr 26, 2017 at 11:28 PM, Casey Stella <ceste...@gmail.com> wrote:

> So, further transformation post-parse was one of the motivating reasons for
> Stellar (to do that transformation post-parse).  Is there a capability that
> it's lacking that we can add to fit your usecase?
>
> On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
>
> > I've created a Jira ticket regarding this feature.
> >
> > https://issues.apache.org/jira/browse/METRON-893
> >
> >
> > On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian <alinazem...@gmail.com>
> > wrote:
> >
> > > Currently, we are using normal regex at the Java source code to handle
> > > those situations. However, it would be nice to have a separate bolt and
> > > deal with them separately. Yeah, I can create a Jira issue regarding
> > that.
> > > The main reason I am asking for such a feature is the fact that lack of
> > > such a feature makes the process of creating some parser for the
> > community
> > > a little painful for us. We need to maintain two different versions,
> one
> > > for community another for the internal use case. Clearly, noise is an
> > > inevitable part of real world use cases.
> > >
> > > Cheers,
> > > Ali
> > >
> > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <ottobackwa...@gmail.com
> >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Are you doing this cleansing all in the parser or are you using any
> > >> Stellar to do it?
> > >> Can you create a jira?
> > >>
> > >>
> > >>
> > >> On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com)
> > >> wrote:
> > >>
> > >> Hi all,
> > >>
> > >>
> > >> We are facing certain use cases in Metron production that happen to be
> > >> related to noisy stream. For example, a wrong timestamp, duplicate
> > >> hostname/IP address, etc. To deal with the normalization we have added
> > an
> > >> additional step for the corresponding parsers to do the data cleaning.
> > >> Clearly, parsing is a standard factor which is mostly related to the
> > >> device
> > >> that is generating the data and can be used for the same type of
> device
> > >> everywhere, but normalization is very production dependent and there
> is
> > >> no
> > >> point of mixing normalization with parsing. It would be nice to have a
> > >> sperate bolt in a parsing topologies to dedicate to production
> > >> related cleaning process. In that case, eveybody can easily contribute
> > to
> > >> Metron community with additional parsers without being worried about
> > >> mixing
> > >> parsers and data cleaning process.
> > >>
> > >>
> > >> Regards,
> > >>
> > >> Ali
> > >>
> > >>
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>



-- 
A.Nazemian

Reply via email to