Hi Casey, It is actually pre-parse process, not a post-parse one. These type of noises affect the position of an attribute for example and give us parsing exception. The timestamp example was not a good one because that is actually a post-parse exception.
On Wed, Apr 26, 2017 at 11:28 PM, Casey Stella <ceste...@gmail.com> wrote: > So, further transformation post-parse was one of the motivating reasons for > Stellar (to do that transformation post-parse). Is there a capability that > it's lacking that we can add to fit your usecase? > > On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian <alinazem...@gmail.com> > wrote: > > > I've created a Jira ticket regarding this feature. > > > > https://issues.apache.org/jira/browse/METRON-893 > > > > > > On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian <alinazem...@gmail.com> > > wrote: > > > > > Currently, we are using normal regex at the Java source code to handle > > > those situations. However, it would be nice to have a separate bolt and > > > deal with them separately. Yeah, I can create a Jira issue regarding > > that. > > > The main reason I am asking for such a feature is the fact that lack of > > > such a feature makes the process of creating some parser for the > > community > > > a little painful for us. We need to maintain two different versions, > one > > > for community another for the internal use case. Clearly, noise is an > > > inevitable part of real world use cases. > > > > > > Cheers, > > > Ali > > > > > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <ottobackwa...@gmail.com > > > > > wrote: > > > > > >> Hi, > > >> > > >> Are you doing this cleansing all in the parser or are you using any > > >> Stellar to do it? > > >> Can you create a jira? > > >> > > >> > > >> > > >> On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com) > > >> wrote: > > >> > > >> Hi all, > > >> > > >> > > >> We are facing certain use cases in Metron production that happen to be > > >> related to noisy stream. For example, a wrong timestamp, duplicate > > >> hostname/IP address, etc. To deal with the normalization we have added > > an > > >> additional step for the corresponding parsers to do the data cleaning. > > >> Clearly, parsing is a standard factor which is mostly related to the > > >> device > > >> that is generating the data and can be used for the same type of > device > > >> everywhere, but normalization is very production dependent and there > is > > >> no > > >> point of mixing normalization with parsing. It would be nice to have a > > >> sperate bolt in a parsing topologies to dedicate to production > > >> related cleaning process. In that case, eveybody can easily contribute > > to > > >> Metron community with additional parsers without being worried about > > >> mixing > > >> parsers and data cleaning process. > > >> > > >> > > >> Regards, > > >> > > >> Ali > > >> > > >> > > > > > > > > > -- > > > A.Nazemian > > > > > > > > > > > -- > > A.Nazemian > > > -- A.Nazemian