Re: [DISCUSS] Enrichment Split/Join issues

Otto Fowler Tue, 16 May 2017 09:34:29 -0700

If we are timing out things from the cache, we have that latency already


On May 16, 2017 at 12:09:32, Casey Stella (ceste...@gmail.com) wrote:

We could definitely parallelize within the bolt, but you're right, it does
break the storm model. I also like making things other people's problems
(it's called working "smart" not "hard", right? not laziness, surely. ;),
but yeah, using windowing for this seems like it might introduce some
artificial latency. It's also not going to eliminate the problem, but
rather just make the knob to tweak things have a different characteristic.
Whereas before we have knobs around how many messages, now it's a knob
around how long an enrichment is going to take maximally (which, I think is
more natural, honestly).

On Tue, May 16, 2017 at 12:05 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Would you then parallelise within Stellar to handle things like multiple
> lookups? This feels like it would be breaking the storm model somewhat,
and
> could lead to bad things with threads for example. Or would you think of
> doing something like the grouping Stellar uses today to parallelise
across
> something like a pool of Stellar bolts and join?
>
> I like the idea of Otto’s solution (making it someone else's problem,
> storm’s specifically :) ) but that also assumes we insert the artificial
> latency of a time windowed join. If we’re going down that route, we might
> as well just use spark and run everything on yarn. At that point though
we
> lose a lot of the benefits of low latency for time to detection, and
> real-time enrichment in things like the streaming enrichment writer.
>
> Simon
>
> > On 16 May 2017, at 16:59, Nick Allen <n...@nickallen.org> wrote:
> >
> > I would like to see us just migrate wholly to Stellar enrichments and
> > remove the separate HBase and Geo enrichment bolts from the Enrichment
> > topology. Stellar provides a user with much greater flexibility than
the
> > existing HBase and Geo enrichment bolts.
> >
> > A side effect of this would be to greatly simplify the Enrichment
> > topology. I don't think we would not need the split/join pattern if we
> did
> > this. No?
> >
> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella <ceste...@gmail.com>
> wrote:
> >
> >> The problem is that an enrichment type won't necessarily have a fixed
> >> performance characteristic. Take stellar enrichments, for instance.
> Doing
> >> a HBase call for one sensor vs doing simple string munging will have
> vastly
> >> differing performance. Both of them are functioning within the stellar
> >> enrichment bolt. Also, some enrichments may call for multiple calls to
> >> HBase. Parallelizing those, would make some sense, I think.
> >>
> >> I do take your point, though, that it's not as though it's strictly
> serial,
> >> it's just that the unit of parallelism is the message, rather than the
> >> enrichment per message.
> >>
> >> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> tramn...@trasec.de>
> >> wrote:
> >>
> >>> I’m glad you bring this up. This is a huge architectural difference
> from
> >>> the original OpenSOC topology and one that we have been warned to
take
> >> back
> >>> then.
> >>> To be perfectly honest, I don’t see the big perfomance improvement
from
> >>> parallel processing. If a specific enrichment is a little more i/o
> >>> dependent than the other you can tweak parallelism to address this.
> Also
> >>> there can be dependencies that make parallel enrichment virtually
> >>> impossible or at least less efficient (i.e. first labeling, and
> >>> “completing” a message and then dependent of label and completeness
do
> >>> different other enrichments).
> >>>
> >>> So you have a +1 from me for serial rather than parallel enrichment.
> >>>
> >>>
> >>> BR,
> >>> Christian
> >>>
> >>> On 16.05.17, 16:58, "Casey Stella" <ceste...@gmail.com> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> Last week, I encountered some weirdness in the Enrichment topology.
> >>> Doing
> >>> some somewhat high-latency enrichment work, I noticed that at some
> >>> point,
> >>> data stopped flowing through the enrichment topology. I tracked
> down
> >>> the
> >>> problem to the join bolt. For those who aren't aware, we do a
> >>> split/join
> >>> pattern so that enrichments can be done in parallel. It works as
> >>> follows:
> >>>
> >>> - A split bolt sends the appropriate subset of the message to
> each
> >>> enrichment bolt as well as the whole message to the join bolt
> >>> - The join bolt will receive each of the pieces of the message
> and
> >>> then,
> >>> when fully joined, it will send the message on.
> >>>
> >>>
> >>> What is happening under load or high velocity, however, is that the
> >>> cache
> >>> is evicting the partially joined message before it can be fully
> >> joined
> >>> due
> >>> to the volume of traffic. This is obviously not ideal. As such, it
> >> is
> >>> clear that adjusting the size of the cache and the characteristics
> of
> >>> eviction is likely a good idea and a necessary part to tuning
> >>> enrichments.
> >>> The cache size is sensitive to:
> >>>
> >>> - The latency of the *slowest* enrichment
> >>> - The number of tuples in flight at once
> >>>
> >>> As such, the knobs you have to tune are either the parallelism of
> the
> >>> join
> >>> bolt or the size of the cache.
> >>>
> >>> As it stands, I see a couple of things wrong here that we can
> correct
> >>> with
> >>> minimal issue:
> >>>
> >>> - We have no message of warning indicating that this is happening
> >>> - Changing cache sizes means changing flux. We should promote
> >> this
> >>> to
> >>> the properties file.
> >>> - We should document the knobs mentioned above clearly in the
> >>> enrichment
> >>> topology README
> >>>
> >>> Those small changes, I think, are table stakes, but what I wanted to
> >>> discuss more in depth is the lingering questions:
> >>>
> >>> - Is this an architectural pattern that we can use as-is?
> >>> - Should we consider a persistent cache a la HBase or Apache
> >>> Ignite
> >>> as a pluggable component to Metron?
> >>> - Should we consider taking the performance hit and doing the
> >>> enrichments serially?
> >>> - When an eviction happens, what should we do?
> >>> - Fail the tuple, thereby making congestion worse
> >>> - Pass through the partially enriched results, thereby making
> >>> enrichments "best effort"
> >>>
> >>> Anyway, I wanted to talk this through and inform of some of the
> >> things
> >>> I'm
> >>> seeing.
> >>>
> >>> Sorry for the novel. ;)
> >>>
> >>> Casey
> >>>
> >>>
> >>>
> >>
>
>

Re: [DISCUSS] Enrichment Split/Join issues

Reply via email to