Ah, yes.  Makes sense and I can see the value in the parallelism that the
split/join provides.  Personally, I would like to see the code do the
following.

(1) Scream and shout when something in the cache expires.  We have to make
sure that it is blatantly obvious to a user what happened.  We also need to
make it blatantly obvious to the user what knobs they can turn to correct
the problem.

(2) Enrichments should be treated as best-effort.  When the cache expires,
it should pass on the message without the enrichments that have not
completed.  If I am relying on an external system for an enrichment, I
don't want an external system outage to fail all of my telemetry.





On Tue, May 16, 2017 at 12:05 PM, Casey Stella <[email protected]> wrote:

> We still do use split/join even within stellar enrichments.  Take for
> instance the following enrichment:
> {
>   "enrichment" : {
>     "fieldMap" : {
>       "stellar" : {
>          "config" : {
>              "parallel-task-1" : {
>                  "my_field" : "PROFILE_GET(....)"
>              },
>              "parallel-task-2" : {
>                  "my_field2" : "PROFILE_GET(....)"
>              }
>          }
>       }
>     }
>   }
>
> Messages will get split between two tasks of the Stellar enrichment bolt
> and the stellar statements in "parallel-task-1" will be executed in
> parallel to those in "parallel-task-2".  This is to enable people to
> separate computationally intensive or otherwise high latency tasks that are
> independent across nodes in the cluster.
>
> I will agree wholeheartedly, though, that my personal desire would be to
> have just stellar enrichments, though.  You can do every one of the other
> enrichments in Stellar and it would greatly simplify that config above.
>
>
>
> On Tue, May 16, 2017 at 11:59 AM, Nick Allen <[email protected]> wrote:
>
> > I would like to see us just migrate wholly to Stellar enrichments and
> > remove the separate HBase and Geo enrichment bolts from the Enrichment
> > topology.  Stellar provides a user with much greater flexibility than the
> > existing HBase and Geo enrichment bolts.
> >
> > A side effect of this would be to greatly simplify the Enrichment
> > topology.  I don't think we would not need the split/join pattern if we
> did
> > this. No?
> >
> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella <[email protected]>
> wrote:
> >
> > > The problem is that an enrichment type won't necessarily have a fixed
> > > performance characteristic.  Take stellar enrichments, for instance.
> > Doing
> > > a HBase call for one sensor vs doing simple string munging will have
> > vastly
> > > differing performance.  Both of them are functioning within the stellar
> > > enrichment bolt.  Also, some enrichments may call for multiple calls to
> > > HBase.  Parallelizing those, would make some sense, I think.
> > >
> > > I do take your point, though, that it's not as though it's strictly
> > serial,
> > > it's just that the unit of parallelism is the message, rather than the
> > > enrichment per message.
> > >
> > > On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> [email protected]
> > >
> > > wrote:
> > >
> > > > I’m glad you bring this up. This is a huge architectural difference
> > from
> > > > the original OpenSOC topology and one that we have been warned to
> take
> > > back
> > > > then.
> > > > To be perfectly honest, I don’t see the big perfomance improvement
> from
> > > > parallel processing. If a specific enrichment is a little more i/o
> > > > dependent than the other you can tweak parallelism to address this.
> > Also
> > > > there can be dependencies that make parallel enrichment virtually
> > > > impossible or at least less efficient (i.e. first labeling, and
> > > > “completing” a message and then dependent of label and completeness
> do
> > > > different other enrichments).
> > > >
> > > > So you have a +1 from me for serial rather than parallel enrichment.
> > > >
> > > >
> > > > BR,
> > > >    Christian
> > > >
> > > > On 16.05.17, 16:58, "Casey Stella" <[email protected]> wrote:
> > > >
> > > >     Hi All,
> > > >
> > > >     Last week, I encountered some weirdness in the Enrichment
> topology.
> > > > Doing
> > > >     some somewhat high-latency enrichment work, I noticed that at
> some
> > > > point,
> > > >     data stopped flowing through the enrichment topology.  I tracked
> > down
> > > > the
> > > >     problem to the join bolt.  For those who aren't aware, we do a
> > > > split/join
> > > >     pattern so that enrichments can be done in parallel.  It works as
> > > > follows:
> > > >
> > > >        - A split bolt sends the appropriate subset of the message to
> > each
> > > >        enrichment bolt as well as the whole message to the join bolt
> > > >        - The join bolt will receive each of the pieces of the message
> > and
> > > > then,
> > > >        when fully joined, it will send the message on.
> > > >
> > > >
> > > >     What is happening under load or high velocity, however, is that
> the
> > > > cache
> > > >     is evicting the partially joined message before it can be fully
> > > joined
> > > > due
> > > >     to the volume of traffic.  This is obviously not ideal.  As such,
> > it
> > > is
> > > >     clear that adjusting the size of the cache and the
> characteristics
> > of
> > > >     eviction is likely a good idea and a necessary part to tuning
> > > > enrichments.
> > > >     The cache size is sensitive to:
> > > >
> > > >        - The latency of the *slowest* enrichment
> > > >        - The number of tuples in flight at once
> > > >
> > > >     As such, the knobs you have to tune are either the parallelism of
> > the
> > > > join
> > > >     bolt or the size of the cache.
> > > >
> > > >     As it stands, I see a couple of things wrong here that we can
> > correct
> > > > with
> > > >     minimal issue:
> > > >
> > > >        - We have no message of warning indicating that this is
> > happening
> > > >        - Changing cache sizes means changing flux.  We should promote
> > > this
> > > > to
> > > >        the properties file.
> > > >        - We should document the knobs mentioned above clearly in the
> > > > enrichment
> > > >        topology README
> > > >
> > > >     Those small changes, I think, are table stakes, but what I wanted
> > to
> > > >     discuss more in depth is the lingering questions:
> > > >
> > > >        - Is this an architectural pattern that we can use as-is?
> > > >           - Should we consider a persistent cache a la HBase or
> Apache
> > > > Ignite
> > > >           as a pluggable component to Metron?
> > > >           - Should we consider taking the performance hit and doing
> the
> > > >           enrichments serially?
> > > >        - When an eviction happens, what should we do?
> > > >           - Fail the tuple, thereby making congestion worse
> > > >           - Pass through the partially enriched results, thereby
> making
> > > >           enrichments "best effort"
> > > >
> > > >     Anyway, I wanted to talk this through and inform of some of the
> > > things
> > > > I'm
> > > >     seeing.
> > > >
> > > >     Sorry for the novel. ;)
> > > >
> > > >     Casey
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to