[ 
https://issues.apache.org/jira/browse/METRON-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386880#comment-16386880
 ] 

ASF GitHub Bot commented on METRON-1460:
----------------------------------------

Github user ben-manes commented on the issue:

    https://github.com/apache/metron/pull/940
  
    Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to 
record its size, per segment. When a read occurs, it records that in the queue 
and then drains it under the segment's lock (via tryLock) to replay the events. 
This is similar to Caffeine, which uses optimized structures instead. I 
intended the CLQ & counter as baseline scaffolding for replacement, as it is an 
obvious bottleneck, but I could never get it replaced despite advocating for 
it. The penalty of draining the buffers is amortized, but unfortunately this 
buffer isn't capped.
    
    Since there would be a higher hit rate with a larger cache, the reads would 
be recorded more often. Perhaps contention there and the penalty of draining 
the queue is more observable than a cache miss. That's still surprising since a 
cache miss is usually more expensive I/O. Is the loader doing expensive work in 
your case?
    
    Caffeine gets around this problem by using more optimal buffers and being 
lossy (on reads only) if it can't keep up. By default it delegates the 
amortized maintenance work to a ForkJoinPool to avoid user-facing latencies, 
since you'll want those variances to be tight. Much of that can be back ported 
onto Guava for a nice boost.


> Create a complementary non-split-join enrichment topology
> ---------------------------------------------------------
>
>                 Key: METRON-1460
>                 URL: https://issues.apache.org/jira/browse/METRON-1460
>             Project: Metron
>          Issue Type: New Feature
>            Reporter: Casey Stella
>            Priority: Major
>
> There are some deficiencies to the split/join topology.
>  * It's hard to reason about
>  * Understanding the latency of enriching a message requires looking at 
> multiple bolts that each give summary statistics
>  * The join bolt's cache is really hard to reason about when performance 
> tuning
>  * During spikes in traffic, you can overload the join bolt's cache and drop 
> messages if you aren't careful
>  * In general, it's hard to associate a cache size and a duration kept in 
> cache with throughput and latency
>  * There are a lot of network hops per message
>  * Right now we are stuck at 2 stages of transformations being done 
> (enrichment and threat intel).  It's very possible that you might want 
> stellar enrichments to depend on the output of other stellar enrichments.  In 
> order to implement this in split/join you'd have to create a cycle in the 
> storm topology
>  
> I propose that we move to a model where we do enrichments in a single bolt in 
> parallel using a static threadpool (e.g. multiple workers in the same process 
> would share the threadpool).  IN all other ways, this would be backwards 
> compatible.  A transparent drop-in for the existing enrichment topology.
> There are some pros/cons about this too:
>  * Pro
>  * Easier to reason about from an individual message perspective
>  * Architecturally decoupled from Storm
>  * This sets us up if we want to consider other streaming technologies
>  * Fewer bolts
>  * spout -> enrichment bolt -> threatintel bolt -> output bolt
>  * Way fewer network hops per message
>  * currently 2n+1 where n is the number of enrichments used (if using stellar 
> subgroups, each subgroup is a hop)
>  * Easier to reason about from a performance perspective
>  * We trade cache size and eviction timeout for threadpool size
>  * We set ourselves up to have stellar subgroups with dependencies
>  * i.e. stellar subgroups that depend on the output of other subgroups
>  * If we do this, we can shrink the topology to just spout -> 
> enrichment/threat intel -> output
>  * Con
>  * We can no longer tune stellar enrichments independent from HBase 
> enrichments
>  * To be fair, with enrichments moving to stellar, this is the case in the 
> split/join approach too
>  * No idea about performance
> What I propose is to submit a PR that will deliver an alternative, completely 
> backwards compatible topology for enrichment that you can use by adjusting 
> the start_enrichment_topology.sh script to use remote-unified.yaml instead of 
> remote.yaml.  If we live with it for a while and have some good experiences 
> with it, maybe we can consider retiring the old enrichment topology.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to