Currently, we are using normal regex at the Java source code to handle
those situations. However, it would be nice to have a separate bolt and
deal with them separately. Yeah, I can create a Jira issue regarding that.
The main reason I am asking for such a feature is the fact that lack of
such a feature makes the process of creating some parser for the community
a little painful for us. We need to maintain two different versions, one
for community another for the internal use case. Clearly, noise is an
inevitable part of real world use cases.

Cheers,
Ali

On Wed, Apr 26, 2017 at 11:04 PM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> Hi,
>
> Are you doing this cleansing all in the parser or are you using any
> Stellar to do it?
> Can you create a jira?
>
>
>
> On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com) wrote:
>
> Hi all,
>
>
> We are facing certain use cases in Metron production that happen to be
> related to noisy stream. For example, a wrong timestamp, duplicate
> hostname/IP address, etc. To deal with the normalization we have added an
> additional step for the corresponding parsers to do the data cleaning.
> Clearly, parsing is a standard factor which is mostly related to the
> device
> that is generating the data and can be used for the same type of device
> everywhere, but normalization is very production dependent and there is no
> point of mixing normalization with parsing. It would be nice to have a
> sperate bolt in a parsing topologies to dedicate to production
> related cleaning process. In that case, eveybody can easily contribute to
> Metron community with additional parsers without being worried about
> mixing
> parsers and data cleaning process.
>
>
> Regards,
>
> Ali
>
>


-- 
A.Nazemian

Reply via email to