Re: [DISCUSS] Enrichment Split/Join issues

Casey Stella Tue, 16 May 2017 09:14:20 -0700

I do want to say here, that I don't mean to sound the alarm and say that
everything is broken.  I would not characterize the topology as "broken"
architecturally, but rather the lack of reporting when things go
pear-shaped is a bug in implementation.  With logging and documentation
about the knobs to tune, this architecture works, I believe.


On Tue, May 16, 2017 at 12:09 PM, Casey Stella <ceste...@gmail.com> wrote:

> We could definitely parallelize within the bolt, but you're right, it does
> break the storm model.  I also like making things other people's problems
> (it's called working "smart" not "hard", right?  not laziness, surely. ;),
> but yeah, using windowing for this seems like it might introduce some
> artificial latency.  It's also not going to eliminate the problem, but
> rather just make the knob to tweak things have a different characteristic.
> Whereas before we have knobs around how many messages, now it's a knob
> around how long an enrichment is going to take maximally (which, I think is
> more natural, honestly).
>
> On Tue, May 16, 2017 at 12:05 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> Would you then parallelise within Stellar to handle things like multiple
>> lookups? This feels like it would be breaking the storm model somewhat, and
>> could lead to bad things with threads for example. Or would you think of
>> doing something like the grouping Stellar uses today to parallelise across
>> something like a pool of Stellar bolts and join?
>>
>> I like the idea of Otto’s solution (making it someone else's problem,
>> storm’s specifically :) ) but that also assumes we insert the artificial
>> latency of a time windowed join. If we’re going down that route, we might
>> as well just use spark and run everything on yarn. At that point though we
>> lose a lot of the benefits of low latency for time to detection, and
>> real-time enrichment in things like the streaming enrichment writer.
>>
>> Simon
>>
>> > On 16 May 2017, at 16:59, Nick Allen <n...@nickallen.org> wrote:
>> >
>> > I would like to see us just migrate wholly to Stellar enrichments and
>> > remove the separate HBase and Geo enrichment bolts from the Enrichment
>> > topology.  Stellar provides a user with much greater flexibility than
>> the
>> > existing HBase and Geo enrichment bolts.
>> >
>> > A side effect of this would be to greatly simplify the Enrichment
>> > topology.  I don't think we would not need the split/join pattern if we
>> did
>> > this. No?
>> >
>> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella <ceste...@gmail.com>
>> wrote:
>> >
>> >> The problem is that an enrichment type won't necessarily have a fixed
>> >> performance characteristic.  Take stellar enrichments, for instance.
>> Doing
>> >> a HBase call for one sensor vs doing simple string munging will have
>> vastly
>> >> differing performance.  Both of them are functioning within the stellar
>> >> enrichment bolt.  Also, some enrichments may call for multiple calls to
>> >> HBase.  Parallelizing those, would make some sense, I think.
>> >>
>> >> I do take your point, though, that it's not as though it's strictly
>> serial,
>> >> it's just that the unit of parallelism is the message, rather than the
>> >> enrichment per message.
>> >>
>> >> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
>> tramn...@trasec.de>
>> >> wrote:
>> >>
>> >>> I’m glad you bring this up. This is a huge architectural difference
>> from
>> >>> the original OpenSOC topology and one that we have been warned to take
>> >> back
>> >>> then.
>> >>> To be perfectly honest, I don’t see the big perfomance improvement
>> from
>> >>> parallel processing. If a specific enrichment is a little more i/o
>> >>> dependent than the other you can tweak parallelism to address this.
>> Also
>> >>> there can be dependencies that make parallel enrichment virtually
>> >>> impossible or at least less efficient (i.e. first labeling, and
>> >>> “completing” a message and then dependent of label and completeness do
>> >>> different other enrichments).
>> >>>
>> >>> So you have a +1 from me for serial rather than parallel enrichment.
>> >>>
>> >>>
>> >>> BR,
>> >>>   Christian
>> >>>
>> >>> On 16.05.17, 16:58, "Casey Stella" <ceste...@gmail.com> wrote:
>> >>>
>> >>>    Hi All,
>> >>>
>> >>>    Last week, I encountered some weirdness in the Enrichment topology.
>> >>> Doing
>> >>>    some somewhat high-latency enrichment work, I noticed that at some
>> >>> point,
>> >>>    data stopped flowing through the enrichment topology.  I tracked
>> down
>> >>> the
>> >>>    problem to the join bolt.  For those who aren't aware, we do a
>> >>> split/join
>> >>>    pattern so that enrichments can be done in parallel.  It works as
>> >>> follows:
>> >>>
>> >>>       - A split bolt sends the appropriate subset of the message to
>> each
>> >>>       enrichment bolt as well as the whole message to the join bolt
>> >>>       - The join bolt will receive each of the pieces of the message
>> and
>> >>> then,
>> >>>       when fully joined, it will send the message on.
>> >>>
>> >>>
>> >>>    What is happening under load or high velocity, however, is that the
>> >>> cache
>> >>>    is evicting the partially joined message before it can be fully
>> >> joined
>> >>> due
>> >>>    to the volume of traffic.  This is obviously not ideal.  As such,
>> it
>> >> is
>> >>>    clear that adjusting the size of the cache and the characteristics
>> of
>> >>>    eviction is likely a good idea and a necessary part to tuning
>> >>> enrichments.
>> >>>    The cache size is sensitive to:
>> >>>
>> >>>       - The latency of the *slowest* enrichment
>> >>>       - The number of tuples in flight at once
>> >>>
>> >>>    As such, the knobs you have to tune are either the parallelism of
>> the
>> >>> join
>> >>>    bolt or the size of the cache.
>> >>>
>> >>>    As it stands, I see a couple of things wrong here that we can
>> correct
>> >>> with
>> >>>    minimal issue:
>> >>>
>> >>>       - We have no message of warning indicating that this is
>> happening
>> >>>       - Changing cache sizes means changing flux.  We should promote
>> >> this
>> >>> to
>> >>>       the properties file.
>> >>>       - We should document the knobs mentioned above clearly in the
>> >>> enrichment
>> >>>       topology README
>> >>>
>> >>>    Those small changes, I think, are table stakes, but what I wanted
>> to
>> >>>    discuss more in depth is the lingering questions:
>> >>>
>> >>>       - Is this an architectural pattern that we can use as-is?
>> >>>          - Should we consider a persistent cache a la HBase or Apache
>> >>> Ignite
>> >>>          as a pluggable component to Metron?
>> >>>          - Should we consider taking the performance hit and doing the
>> >>>          enrichments serially?
>> >>>       - When an eviction happens, what should we do?
>> >>>          - Fail the tuple, thereby making congestion worse
>> >>>          - Pass through the partially enriched results, thereby making
>> >>>          enrichments "best effort"
>> >>>
>> >>>    Anyway, I wanted to talk this through and inform of some of the
>> >> things
>> >>> I'm
>> >>>    seeing.
>> >>>
>> >>>    Sorry for the novel. ;)
>> >>>
>> >>>    Casey
>> >>>
>> >>>
>> >>>
>> >>
>>
>>
>

Re: [DISCUSS] Enrichment Split/Join issues

Reply via email to