I've created a Jira ticket regarding this feature. https://issues.apache.org/jira/browse/METRON-893
On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian <alinazem...@gmail.com> wrote: > Currently, we are using normal regex at the Java source code to handle > those situations. However, it would be nice to have a separate bolt and > deal with them separately. Yeah, I can create a Jira issue regarding that. > The main reason I am asking for such a feature is the fact that lack of > such a feature makes the process of creating some parser for the community > a little painful for us. We need to maintain two different versions, one > for community another for the internal use case. Clearly, noise is an > inevitable part of real world use cases. > > Cheers, > Ali > > On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > >> Hi, >> >> Are you doing this cleansing all in the parser or are you using any >> Stellar to do it? >> Can you create a jira? >> >> >> >> On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com) >> wrote: >> >> Hi all, >> >> >> We are facing certain use cases in Metron production that happen to be >> related to noisy stream. For example, a wrong timestamp, duplicate >> hostname/IP address, etc. To deal with the normalization we have added an >> additional step for the corresponding parsers to do the data cleaning. >> Clearly, parsing is a standard factor which is mostly related to the >> device >> that is generating the data and can be used for the same type of device >> everywhere, but normalization is very production dependent and there is >> no >> point of mixing normalization with parsing. It would be nice to have a >> sperate bolt in a parsing topologies to dedicate to production >> related cleaning process. In that case, eveybody can easily contribute to >> Metron community with additional parsers without being worried about >> mixing >> parsers and data cleaning process. >> >> >> Regards, >> >> Ali >> >> > > > -- > A.Nazemian > -- A.Nazemian