Yeah, I'm leaning toward STATS_ADD or STATS_INIT taking a list of numbers. STATS_MERGE seems confusing.
On Wed, Aug 9, 2017 at 4:37 PM, Nick Allen <n...@nickallen.org> wrote: > Or even change the behavior of STATS_MERGE, too? If STATS_MERGE gets raw > numbers, it wraps those in a Stats object, then returns it. Then Dima's > example would just work as-is. > > I'm not sure I like that though. Maybe so flexible as to be confusing? > Thought I would throw it out as an alternative to consider. > > > > > On Wed, Aug 9, 2017 at 4:31 PM Nick Allen <n...@nickallen.org> wrote: > > > Oh yeah, duh. Now I'm with you. That would be a good quick hit. > > > > The current behavior is a little nutty. If there is a list, it only > > consumes the first element in the list. I'd expect that it should either > > do what you describe or complain that it doesn't know how to handle a > > list. Easy fix though. > > > > [Stellar]>>> STATS_MEAN(STATS_ADD(null, 1, 2, 3)) > > 2.0 > > > > [Stellar]>>> STATS_MEAN(STATS_ADD(null, [1,2,3])) > > 1.0 > > > > [Stellar]>>> STATS_COUNT(STATS_ADD(null, [1,2,3])) > > 1.0 > > > > On Wed, Aug 9, 2017 at 4:17 PM Casey Stella <ceste...@gmail.com> wrote: > > > >> outcoming is still a HLLP object, not a statistics object, so doing a > >> STATS_MERGE on a bunch of them wouldn't work either. > >> > >> On Wed, Aug 9, 2017 at 4:15 PM, Nick Allen <n...@nickallen.org> wrote: > >> > >> > That is another problem. Isn't the simplest answer, to just change > >> this... > >> > > >> > "result": "HLLP_CARDINALITY(outcoming)" > >> > > >> > to this... > >> > > >> > "result": "outcoming" > >> > > >> > ? > >> > > >> > On Wed, Aug 9, 2017 at 3:48 PM Casey Stella <ceste...@gmail.com> > wrote: > >> > > >> > > Ok, so the problem here is that your profile is returning integers > >> > > (specifically HLLP cardinalities) rather than stats objects. When > >> you're > >> > > doing: > >> > > STATS_PERCENTILE(STATS_MERGE( PROFILE_GET('host-talks-to', > >> > > '99.191.183.156', PROFILE_FIXED(10, 'HOURS')), 90) > >> > > You are calling STATS_MERGE on a list of integers and it takes a > list > >> of > >> > > statistics objects. > >> > > > >> > > What you can do instead is: > >> > > STATS_PERCENTILE( REDUCE( PROFILE_GET('host-talks-to', > >> > > '99.191.183.156', PROFILE_FIXED(10, 'HOURS'), (s, x) -> STATS_ADD(s, > >> x), > >> > > STATS_INIT()), 90) > >> > > > >> > > Ok, that looks horrible, doesn't it? Well, thankfully we added > >> temporary > >> > > variables for stellar enrichments in 0.4.1. Let's take that > "numeric" > >> > > stellar enrichment group and reimagine it. With temporary > variables, > >> you > >> > > would turn: > >> > > > >> > > "numeric" : { > >> > > "value_red_level_out": "STATS_PERCENTILE( REDUCE( > >> > > PROFILE_GET('host-being-talked-to', ip_src_addr, PROFILE_FIXED(1, > >> > > 'HOURS')), (s, x) -> STATS_ADD(s, x), STATS_INIT()), 95)", > >> > > "value_red_level_in": "STATS_PERCENTILE( REDUCE( > >> > > PROFILE_GET('host-talks-to', > >> > > ip_src_addr, PROFILE_FIXED(1, 'HOURS')), (s, x) -> STATS_ADD(s, x), > >> > > STATS_INIT()), 95)" > >> > > }, > >> > > > >> > > into: > >> > > "numeric" : [ > >> > > "profile_duration := PROFILE_FIXED(1, 'HOURS')", > >> > > "host_being_talked_to := PROFILE_GET('host-being- > >> > talked-to', > >> > > ip_src_addr, profile_duration)", > >> > > "host_talks_to := PROFILE_GET('host-talks-to', > >> ip_src_addr, > >> > > profile_duration)", > >> > > "host_being_talked_to_stats := REDUCE( > >> host_being_talked_to, > >> > > (s, x) -> STATS_ADD(s, x), STATS_INIT())", > >> > > "host_talks_to_stats := REDUCE(host_talks_to, (s, x) -> > >> > > STATS_ADD(s, x), STATS_INIT())", > >> > > "value_red_level_out": "STATS_PERCENTILE( > >> > > host_being_talked_to_stats, 95)", > >> > > "value_red_level_in": "STATS_PERCENTILE( > >> > host_talks_to_stats, > >> > > 95)", > >> > > "profile_duration := null", > >> > > "host_being_talked_to := null", > >> > > "host_talks_to := null", > >> > > "host_being_talked_to_stats := null", > >> > > host_talks_to_stats := null" > >> > > ], > >> > > > >> > > That's a lot more to type, but it allows you to reuse and take the > >> pieces > >> > > in chunks. > >> > > > >> > > Ok, so now I find myself thinking "a pox on both your houses" since > >> both > >> > > examples now kinda look long and convoluted. So, why are they? > Well, > >> > that > >> > > REDUCE is likely the culprit. It's supposed to get us out of bad > >> > > situations not show up in what could be argued is the 80% case. How > >> > about, > >> > > instead, we allow STATS_ADD or STATS_INIT to take a list of > >> numbers? If > >> > > so, we could pretty easily make that nicer: > >> > > STATS_PERCENTILE( STATS_ADD( PROFILE_GET('host-being- > talked-to', > >> > > ip_src_addr, PROFILE_FIXED(1, 'HOURS'))), 95) > >> > > > >> > > or > >> > > STATS_PERCENTILE( STATS_INIT( > >> PROFILE_GET('host-being-talked-to', > >> > > ip_src_addr, PROFILE_FIXED(1, 'HOURS'))), 95) > >> > > > >> > > > >> > > We should make some sort of candy like that so we can avoid some of > >> the > >> > > complexity in the normal case. > >> > > > >> > > On Wed, Aug 9, 2017 at 3:03 PM, Dima Kovalyov < > >> dima.koval...@sstech.us> > >> > > wrote: > >> > > > >> > > > Hello Metron Team, > >> > > > > >> > > > I have created following profiler: > >> > > > > { > >> > > > > "profile": "host-talks-to", > >> > > > > "onlyif": "exists(source_ip)", > >> > > > > "foreach": "source_ip", > >> > > > > "init": { > >> > > > > "outcoming": "HLLP_INIT(5, 6)" > >> > > > > }, > >> > > > > "update": { "outcoming": "HLLP_ADD(outcoming, destination_ip)" > >> }, > >> > > > > "result": "HLLP_CARDINALITY(outcoming)" > >> > > > > } > >> > > > I have also created enrichment rule: > >> > > > > { > >> > > > > "enrichment" : { > >> > > > > "fieldMap": { > >> > > > > "stellar" : { > >> > > > > "config" : { > >> > > > > "numeric" : { > >> > > > > "value_red_level_out": "STATS_PERCENTILE( > STATS_MERGE( > >> > > > > PROFILE_GET('host-being-talked-to', ip_src_addr, 1, 'HOURS')), > >> 95)", > >> > > > > "value_red_level_in": "STATS_PERCENTILE( > STATS_MERGE( > >> > > > > PROFILE_GET('host-talks-to', ip_src_addr, 1, 'HOURS')), 95)" > >> > > > > }, > >> > > > > "text" : { > >> > > > > "is_alert": "true" > >> > > > > } > >> > > > > } > >> > > > > } > >> > > > > } > >> > > > > } } > >> > > > However when I stream data to it I receive: "value_red_level_out": > >> > null, > >> > > > > >> > > > I have checked in profiler client and here is what I got: > >> > > > > [Stellar]>>> PROFILE_GET( "host-talks-to" , "99.191.183.156", > >> > > > > PROFILE_FIXED(300, "MINUTES")) > >> > > > > [1, 6, 6, 6, 6, 6, 3, 4, 5, 6, 4, 6, 6, 6, 1, 1, 6, 6, 1, 4, 1, > >> 1, 4, > >> > > > > 6, 6, 1, 6, 6, 1, 2, 6, 1, 1, 1, 6, 4, 6, 6, 3, 1, 6, 2, 1, 6, > 1, > >> 6] > >> > > > > [Stellar]>>> STATS_PERCENTILE(STATS_MERGE( > >> > > > > PROFILE_GET('host-talks-to', '99.191.183.156', PROFILE_FIXED(10, > >> > > > > 'HOURS'))), 90) > >> > > > > NaN > >> > > > > [Stellar]>>> STATS_MERGE( PROFILE_GET('host-talks-to', > >> > > > > '99.191.183.156', PROFILE_FIXED(10, 'HOURS'))) > >> > > > So the STATS_MERGE produces no results. Is this something expected > >> or I > >> > > > made a mistake somewhere? Please advise. > >> > > > > >> > > > > >> > > > p.s. I am following this use cases: > >> > > > > >> > > https://github.com/hortonworks-gallery/metron- > >> > rules/tree/master/use-cases/ > >> > > > DegreeOfHost > >> > > > There were number of errors in the configs originally, which I > have > >> > > > corrected, maybe I missed something else. > >> > > > > >> > > > - Dima > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > > >