[ https://issues.apache.org/jira/browse/METRON-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386972#comment-16386972 ]
ASF GitHub Bot commented on METRON-1460: ---------------------------------------- Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172359339 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; + +/** + * Enrichment strategy. This interface provides a mechanism to interface with the enrichment config and any + * post processing steps that are needed to be done after-the-fact. + * + * The reasoning behind this is that the key difference between enrichments and threat intel is that they pull + * their configurations from different parts of the SensorEnrichmentConfig object and as a post-join step, they differ + * slightly. + * + */ +public interface Strategy { + Constants.ErrorType getErrorType(); --- End diff -- Can we javadoc each method? This seems like an important interface. > Create a complementary non-split-join enrichment topology > --------------------------------------------------------- > > Key: METRON-1460 > URL: https://issues.apache.org/jira/browse/METRON-1460 > Project: Metron > Issue Type: New Feature > Reporter: Casey Stella > Priority: Major > > There are some deficiencies to the split/join topology. > * It's hard to reason about > * Understanding the latency of enriching a message requires looking at > multiple bolts that each give summary statistics > * The join bolt's cache is really hard to reason about when performance > tuning > * During spikes in traffic, you can overload the join bolt's cache and drop > messages if you aren't careful > * In general, it's hard to associate a cache size and a duration kept in > cache with throughput and latency > * There are a lot of network hops per message > * Right now we are stuck at 2 stages of transformations being done > (enrichment and threat intel). It's very possible that you might want > stellar enrichments to depend on the output of other stellar enrichments. In > order to implement this in split/join you'd have to create a cycle in the > storm topology > > I propose that we move to a model where we do enrichments in a single bolt in > parallel using a static threadpool (e.g. multiple workers in the same process > would share the threadpool). IN all other ways, this would be backwards > compatible. A transparent drop-in for the existing enrichment topology. > There are some pros/cons about this too: > * Pro > * Easier to reason about from an individual message perspective > * Architecturally decoupled from Storm > * This sets us up if we want to consider other streaming technologies > * Fewer bolts > * spout -> enrichment bolt -> threatintel bolt -> output bolt > * Way fewer network hops per message > * currently 2n+1 where n is the number of enrichments used (if using stellar > subgroups, each subgroup is a hop) > * Easier to reason about from a performance perspective > * We trade cache size and eviction timeout for threadpool size > * We set ourselves up to have stellar subgroups with dependencies > * i.e. stellar subgroups that depend on the output of other subgroups > * If we do this, we can shrink the topology to just spout -> > enrichment/threat intel -> output > * Con > * We can no longer tune stellar enrichments independent from HBase > enrichments > * To be fair, with enrichments moving to stellar, this is the case in the > split/join approach too > * No idea about performance > What I propose is to submit a PR that will deliver an alternative, completely > backwards compatible topology for enrichment that you can use by adjusting > the start_enrichment_topology.sh script to use remote-unified.yaml instead of > remote.yaml. If we live with it for a while and have some good experiences > with it, maybe we can consider retiring the old enrichment topology. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)