Ah, yes. Makes sense and I can see the value in the parallelism that the split/join provides. Personally, I would like to see the code do the following.
(1) Scream and shout when something in the cache expires. We have to make sure that it is blatantly obvious to a user what happened. We also need to make it blatantly obvious to the user what knobs they can turn to correct the problem. (2) Enrichments should be treated as best-effort. When the cache expires, it should pass on the message without the enrichments that have not completed. If I am relying on an external system for an enrichment, I don't want an external system outage to fail all of my telemetry. On Tue, May 16, 2017 at 12:05 PM, Casey Stella <[email protected]> wrote: > We still do use split/join even within stellar enrichments. Take for > instance the following enrichment: > { > "enrichment" : { > "fieldMap" : { > "stellar" : { > "config" : { > "parallel-task-1" : { > "my_field" : "PROFILE_GET(....)" > }, > "parallel-task-2" : { > "my_field2" : "PROFILE_GET(....)" > } > } > } > } > } > > Messages will get split between two tasks of the Stellar enrichment bolt > and the stellar statements in "parallel-task-1" will be executed in > parallel to those in "parallel-task-2". This is to enable people to > separate computationally intensive or otherwise high latency tasks that are > independent across nodes in the cluster. > > I will agree wholeheartedly, though, that my personal desire would be to > have just stellar enrichments, though. You can do every one of the other > enrichments in Stellar and it would greatly simplify that config above. > > > > On Tue, May 16, 2017 at 11:59 AM, Nick Allen <[email protected]> wrote: > > > I would like to see us just migrate wholly to Stellar enrichments and > > remove the separate HBase and Geo enrichment bolts from the Enrichment > > topology. Stellar provides a user with much greater flexibility than the > > existing HBase and Geo enrichment bolts. > > > > A side effect of this would be to greatly simplify the Enrichment > > topology. I don't think we would not need the split/join pattern if we > did > > this. No? > > > > On Tue, May 16, 2017 at 11:54 AM, Casey Stella <[email protected]> > wrote: > > > > > The problem is that an enrichment type won't necessarily have a fixed > > > performance characteristic. Take stellar enrichments, for instance. > > Doing > > > a HBase call for one sensor vs doing simple string munging will have > > vastly > > > differing performance. Both of them are functioning within the stellar > > > enrichment bolt. Also, some enrichments may call for multiple calls to > > > HBase. Parallelizing those, would make some sense, I think. > > > > > > I do take your point, though, that it's not as though it's strictly > > serial, > > > it's just that the unit of parallelism is the message, rather than the > > > enrichment per message. > > > > > > On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz < > [email protected] > > > > > > wrote: > > > > > > > I’m glad you bring this up. This is a huge architectural difference > > from > > > > the original OpenSOC topology and one that we have been warned to > take > > > back > > > > then. > > > > To be perfectly honest, I don’t see the big perfomance improvement > from > > > > parallel processing. If a specific enrichment is a little more i/o > > > > dependent than the other you can tweak parallelism to address this. > > Also > > > > there can be dependencies that make parallel enrichment virtually > > > > impossible or at least less efficient (i.e. first labeling, and > > > > “completing” a message and then dependent of label and completeness > do > > > > different other enrichments). > > > > > > > > So you have a +1 from me for serial rather than parallel enrichment. > > > > > > > > > > > > BR, > > > > Christian > > > > > > > > On 16.05.17, 16:58, "Casey Stella" <[email protected]> wrote: > > > > > > > > Hi All, > > > > > > > > Last week, I encountered some weirdness in the Enrichment > topology. > > > > Doing > > > > some somewhat high-latency enrichment work, I noticed that at > some > > > > point, > > > > data stopped flowing through the enrichment topology. I tracked > > down > > > > the > > > > problem to the join bolt. For those who aren't aware, we do a > > > > split/join > > > > pattern so that enrichments can be done in parallel. It works as > > > > follows: > > > > > > > > - A split bolt sends the appropriate subset of the message to > > each > > > > enrichment bolt as well as the whole message to the join bolt > > > > - The join bolt will receive each of the pieces of the message > > and > > > > then, > > > > when fully joined, it will send the message on. > > > > > > > > > > > > What is happening under load or high velocity, however, is that > the > > > > cache > > > > is evicting the partially joined message before it can be fully > > > joined > > > > due > > > > to the volume of traffic. This is obviously not ideal. As such, > > it > > > is > > > > clear that adjusting the size of the cache and the > characteristics > > of > > > > eviction is likely a good idea and a necessary part to tuning > > > > enrichments. > > > > The cache size is sensitive to: > > > > > > > > - The latency of the *slowest* enrichment > > > > - The number of tuples in flight at once > > > > > > > > As such, the knobs you have to tune are either the parallelism of > > the > > > > join > > > > bolt or the size of the cache. > > > > > > > > As it stands, I see a couple of things wrong here that we can > > correct > > > > with > > > > minimal issue: > > > > > > > > - We have no message of warning indicating that this is > > happening > > > > - Changing cache sizes means changing flux. We should promote > > > this > > > > to > > > > the properties file. > > > > - We should document the knobs mentioned above clearly in the > > > > enrichment > > > > topology README > > > > > > > > Those small changes, I think, are table stakes, but what I wanted > > to > > > > discuss more in depth is the lingering questions: > > > > > > > > - Is this an architectural pattern that we can use as-is? > > > > - Should we consider a persistent cache a la HBase or > Apache > > > > Ignite > > > > as a pluggable component to Metron? > > > > - Should we consider taking the performance hit and doing > the > > > > enrichments serially? > > > > - When an eviction happens, what should we do? > > > > - Fail the tuple, thereby making congestion worse > > > > - Pass through the partially enriched results, thereby > making > > > > enrichments "best effort" > > > > > > > > Anyway, I wanted to talk this through and inform of some of the > > > things > > > > I'm > > > > seeing. > > > > > > > > Sorry for the novel. ;) > > > > > > > > Casey > > > > > > > > > > > > > > > > > >
