Having Stellar function for the normalization is very cool actually.

Casey, how are you going to deal with normalization after the parsing if
that noise affects the parsing? For some reason, the incoming data do not
look like in the way that has to be.

On Wed, Apr 26, 2017 at 11:37 PM, Casey Stella <ceste...@gmail.com> wrote:

> Ok, that's another story.  hmmmm, we don't generally pre-parse becuase we
> try to not assume any particular format there (i.e. it could be strings,
> could be byte arrays).  Maybe the right answer is to pass the raw,
> non-normalized data (best effort tyep of thing) through the parser and do
> the normalization post-parse..or is there a problem with that?
>
> On Wed, Apr 26, 2017 at 9:33 AM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
>
> > Hi Casey,
> >
> > It is actually pre-parse process, not a post-parse one. These type of
> > noises affect the position of an attribute for example and give us
> parsing
> > exception. The timestamp example was not a good one because that is
> > actually a post-parse exception.
> >
> > On Wed, Apr 26, 2017 at 11:28 PM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> > > So, further transformation post-parse was one of the motivating reasons
> > for
> > > Stellar (to do that transformation post-parse).  Is there a capability
> > that
> > > it's lacking that we can add to fit your usecase?
> > >
> > > On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian <alinazem...@gmail.com>
> > > wrote:
> > >
> > > > I've created a Jira ticket regarding this feature.
> > > >
> > > > https://issues.apache.org/jira/browse/METRON-893
> > > >
> > > >
> > > > On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian <
> alinazem...@gmail.com>
> > > > wrote:
> > > >
> > > > > Currently, we are using normal regex at the Java source code to
> > handle
> > > > > those situations. However, it would be nice to have a separate bolt
> > and
> > > > > deal with them separately. Yeah, I can create a Jira issue
> regarding
> > > > that.
> > > > > The main reason I am asking for such a feature is the fact that
> lack
> > of
> > > > > such a feature makes the process of creating some parser for the
> > > > community
> > > > > a little painful for us. We need to maintain two different
> versions,
> > > one
> > > > > for community another for the internal use case. Clearly, noise is
> an
> > > > > inevitable part of real world use cases.
> > > > >
> > > > > Cheers,
> > > > > Ali
> > > > >
> > > > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <
> > ottobackwa...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> Are you doing this cleansing all in the parser or are you using
> any
> > > > >> Stellar to do it?
> > > > >> Can you create a jira?
> > > > >>
> > > > >>
> > > > >>
> > > > >> On April 26, 2017 at 08:59:16, Ali Nazemian (
> alinazem...@gmail.com)
> > > > >> wrote:
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >>
> > > > >> We are facing certain use cases in Metron production that happen
> to
> > be
> > > > >> related to noisy stream. For example, a wrong timestamp, duplicate
> > > > >> hostname/IP address, etc. To deal with the normalization we have
> > added
> > > > an
> > > > >> additional step for the corresponding parsers to do the data
> > cleaning.
> > > > >> Clearly, parsing is a standard factor which is mostly related to
> the
> > > > >> device
> > > > >> that is generating the data and can be used for the same type of
> > > device
> > > > >> everywhere, but normalization is very production dependent and
> there
> > > is
> > > > >> no
> > > > >> point of mixing normalization with parsing. It would be nice to
> > have a
> > > > >> sperate bolt in a parsing topologies to dedicate to production
> > > > >> related cleaning process. In that case, eveybody can easily
> > contribute
> > > > to
> > > > >> Metron community with additional parsers without being worried
> about
> > > > >> mixing
> > > > >> parsers and data cleaning process.
> > > > >>
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Ali
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>



-- 
A.Nazemian

Reply via email to