[jira] [Commented] (METRON-1657) Parser aggregation in storm

ASF GitHub Bot (JIRA) Mon, 16 Jul 2018 12:56:07 -0700


    [ 
https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545679#comment-16545679
 ]


ASF GitHub Bot commented on METRON-1657:
----------------------------------------

Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1099#discussion_r202803869
  
    --- Diff: 
metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/bolt/ParserBolt.java
 ---
    @@ -182,40 +185,61 @@ public void prepare(Map stormConf, TopologyContext 
context, OutputCollector coll
         super.prepare(stormConf, context, collector);
         messageGetStrategy = MessageGetters.DEFAULT_BYTES_FROM_POSITION.get();
         this.collector = collector;
    -    if(getSensorParserConfig() != null) {
    -      cache = 
CachingStellarProcessor.createCache(getSensorParserConfig().getCacheConfig());
    -    }
    -    initializeStellar();
    -    if(getSensorParserConfig() != null && filter == null) {
    -      
getSensorParserConfig().getParserConfig().putIfAbsent("stellarContext", 
stellarContext);
    -      if 
(!StringUtils.isEmpty(getSensorParserConfig().getFilterClassName())) {
    -        filter = Filters.get(getSensorParserConfig().getFilterClassName()
    -                , getSensorParserConfig().getParserConfig()
    -        );
    +
    +    // Build the Stellar cache
    +    Map<String, Object> cacheConfig = new HashMap<>();
    +    for (Map.Entry<String, ParserComponents> entry: 
sensorToComponentMap.entrySet()) {
    +      String sensor = entry.getKey();
    +      SensorParserConfig config = getSensorParserConfig(sensor);
    +
    +      if (config != null) {
    +        cacheConfig.putAll(config.getCacheConfig());
           }
         }
    +    cache = CachingStellarProcessor.createCache(cacheConfig);
     
    -    parser.init();
    +    // Need to prep all sensors
    +    for (Map.Entry<String, ParserComponents> entry: 
sensorToComponentMap.entrySet()) {
    +      String sensor = entry.getKey();
    +      MessageParser<JSONObject> parser = 
entry.getValue().getMessageParser();
     
    --- End diff --
    
    So, the consequences of this decision are as follows:
    * You share an expression cache (i.e. the statement -> abstract syntax tree 
cache; distinct from the expression -> evaluated return cache)
    * You share an stellar value cache (expression -> evaluated return)
    * You share the state in the Context (e.g. hbase connections, zookeeper 
connections).
    
    On the whole, anything shared in the context is intended to be shared 
across users and sensors by virtue of Stellar being used in the enrichment 
topology (where it's not sensor-by-sensor), so we shoudl be ok there.  The real 
question is whether users would prefer to have one knob per topology for 
stellar cache sizing or whether they would prefer to have one knob per sensor.  
I'd say that I'm ok with how this PR is doing it, because it's easier to reason 
about resources, IMO, on a per-topology perspective.


> Parser aggregation in storm
> ---------------------------
>
>                 Key: METRON-1657
>                 URL: https://issues.apache.org/jira/browse/METRON-1657
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Justin Leet
>            Assignee: Justin Leet
>            Priority: Major
>
> Currently our parsing solution requires one storm topology per sensor. It has 
> been complained that this may be wasteful of resources and that, rather than 
> one storm topology per sensor, it would be advantageous to have multiple 
> sensors in the same topology. The benefit to this is that it would require 
> fewer storm slots.
> The issue with this is that whenever we've aggregated functionality like this 
> before, we've run into issues appropriately being able to scale storm (e.g. 
> batch vs random access indexing in the same topology).  The main point in 
> addressing this is to recommend that parsers with similar velocities and 
> complexity are grouped together.
> Particularly for a first cut, leave the configuration mostly as-is, while 
> allowing for comma separated lists of sensors in start_parser_topology.sh 
> (e.g. bro,yaf creates a aggregated parser consisting of those two). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (METRON-1657) Parser aggregation in storm

Reply via email to