[ 
https://issues.apache.org/jira/browse/METRON-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408331#comment-16408331
 ] 

ASF GitHub Bot commented on METRON-1496:
----------------------------------------

Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/969
  
    @kevin91nl Thanks for the submission! I really do appreciate the clarity of 
your code and explanations.
    
    > Performance A ChainParser does not need to interpret a language the way 
Stellar does. We found that interpreting a language was a performance 
bottleneck in the parsers. In fact, we implemented a lightweight template 
engine in one of our links, but that was breaking the throughput rate of the 
parsers. Therefore, we removed this mechanism.
    
    Can you elaborate on the performance bottleneck that you experienced?  
    
    I have tested this myself and found little to no performance difference 
between the execution of something written in raw Java versus Stellar.  I 
compared doing Geo enrichments in the legacy Java adapter versus doing them 
with `GEO_GET` in Stellar.  I also compared doing enrichments with the legacy 
HBase Java adapter versus `ENRICHMENT_GET`.
    
    Maybe the root cause of the performance bottleneck that you experienced was 
something else, a tuning issue or an unexpected problem with your template 
engine?  There also certainly could be a performance bug somewhere in Stellar 
that I am not aware of, but I think it is reasonable to expect similar 
performance between raw Java and Stellar.   If we do have a performance issue 
with Stellar, then I'd like to isolate and address it.


> ChainLink Parser to reuse parser code at parserConfig level
> -----------------------------------------------------------
>
>                 Key: METRON-1496
>                 URL: https://issues.apache.org/jira/browse/METRON-1496
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Bas van de Lustgraaf
>            Priority: Minor
>
> During the development of some custom parsers we wrote a couple of classes / 
> functions to make it possible to reuse code and assemble parser quicker at 
> java coding level.
> We took this idea one step further and created the so called ChainLinkParser.
> This parser gives user without any java knowledge the opportunity to assemble 
> parsers at parser configuration level.
> We would like to discuss the code and see if it can be submitted to the 
> project. We will create a PR during this week to submit the code for review 
> and discussion.
> Below you'll find an example of our Parser configuration for Suricata, which 
> is using our ChainParser. 
>  
> {noformat}
> {
>    "parserClassName":"nl.qsight.chainparser.ChainParser",
>    "sensorTopic":"suricata",
>    "readMetadata":true,
>    "mergeMetadata":true,
>    "numWorkers":3,
>    "numAckers":3,
>    "spoutParallelism":6,
>    "spoutNumTasks":6,
>    "parserParallelism":20,
>    "parserNumTasks":20,
>    "errorWriterParallelism":1,
>    "errorWriterNumTasks":1,
>    "spoutConfig":{
>       "spout.firstPollOffsetStrategy":"LATEST"
>    },
>    "stormConfig":{
>       "topology.max.spout.pending":2000
>    },
>    "parserConfig":{
>       "chain":[
>          "parse_json",
>          "parse_username",
>          "rename_fields",
>          "parse_datetime"
>       ],
>       "parsers":{
>          "parse_json":{
>             "class":"nl.qsight.links.io.JSONDecoderLink"
>          },
>          "parse_username":{
>             "class":"nl.qsight.links.io.RegexLink",
>             "pattern":"(?i)(user|username|log)[=:](\\w+)",
>             "selector":{
>                "username":"2"
>             },
>             "input":"{{payload_printable}}"
>          },
>          "rename_fields":{
>             "class":"nl.qsight.links.fields.RenameLink",
>             "rename":{
>                "proto":"protocol",
>                "dest_ip":"ip_dst_addr",
>                "src_ip":"ip_src_addr",
>                "dest_port":"ip_dst_port",
>                "src_port":"ip_src_port"
>             }
>          },
>          "parse_datetime":{
>             "class":"nl.qsight.links.io.TimestampLink",
>             "patterns":[
>                [
>                   
> "([0-9]{4})-([0-9]+)-([0-9]+)T([0-9]+):([0-9]+):([0-9]+).([0-9]+)([+-]{1}[0-9]{1,2}[:]?[0-9]{2})",
>                   "yyyy MM dd HH mm ss SSSSSS Z",
>                   "newest"
>                ]
>             ],
>             "input":"{{timestamp}}"
>          }
>       }
>    }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to