I think one of the big improvements vs. opensoc was the splitter joiner parallel execution model for enrichments though, switching them all to stellar as you state would be a regression.
On January 9, 2017 at 14:27:29, Ryan Merriman ([email protected]) wrote: We can already do what Carolyn suggests with Stellar enrichments, assuming we get a Stellar function created for Geo enrichments (all enrichments ideally). Would be configured like this under enrichment section in enrichment config: "stellar" : { "config" : { "geo_enriched_field" : "GEO_ENRICHMENT(field)" "enriched_geo_enriched_field": "ENRICHMENT_GET('some_other_enrichment', geo_enriched_field, 'enrichments', 'cf')" } } These statements are executed in order and can be grouped together. As Casey pointed out the limitation is that both would run in a single Storm worker. Would that be acceptable tradeoff? Sure it would be ideal if we could execute each one in separate workers but then we would have to re-architect our topologies and our system would become much more complex. Ryan On Mon, Jan 9, 2017 at 12:59 PM, Carolyn Duby <[email protected]> wrote: > Adding new topologies adds more processing requirements to the system. It > adds more topics (storage) and more producers and consumers to kafka > (processing). > > I think what we need is a dependency of enrichments. Maybe we need to > either derive the dependencies using the Stellar (potentially not that > easy) or allow the enrichment to specify the order or enrichment > calculations. > This will allow users to calculate more enrichment in the same topology. > > Thanks > Carolyn > > > > > Sent from my Verizon, Samsung Galaxy smartphone > > > -------- Original message -------- > From: Nick Allen <[email protected]> > Date: 1/9/17 8:49 AM (GMT-08:00) > To: [email protected] > Subject: Re: Enrich enrichment > > I agree that making it easy for the user to "enrich enrichments", as Dima > put it, to an arbitrary depth, would be extremely useful for a lot of use > cases. We've discussed the use case a little in the past in this thread > [1]. > > Re-purposing the "threat intel" phase gives us something that is feasible > today, but only to a "depth" of 2. We would also need to rename and > redocument it so that users understand how they can leverage the two > phases. This seems like a minimally viable option if we want to head down > this road. > > The other extreme might involve inferring the topology needed based on the > user's configuration. If the user needs 3 phases, then we build a topology > that supports 3 phases. Under the covers instead of using Flux, we would > use Storm's topology builder Java API to grok the configuration and build > the topology(ies) that the user needs. > > I am not sure if we can infer this from the configuration as it exists > today or if we would need to redefine the configuration somehow. Like I > said this is "extreme", but could give the user more expressive and > intuitive options. > > > > > --- > [1] > http://mail-archives.apache.org/mod_mbox/incubator-metron- > dev/201610.mbox/%3CCAHSJ8NwJUiyp3YO6NVE4tfLoSSk > Oc6QG%2BMsAJSSDu%2B-wfct_vw%40mail.gmail.com%3E > > > > On Mon, Jan 9, 2017 at 10:56 AM, Casey Stella <[email protected]> wrote: > > > I think that would be a good feature to add to have arbitrary number of > > phases, though it might be tricky to code (the way I envisioned it would > > involve a loop in storm, which is possible[1]), might have unintended > > consequences to guarantees (e.g. updating enrichments might not be able > to > > be applied in realtime) and could be tricky to reason about > > performance-wise. > > > > As it stands, the number of phases is a consequence of the topology > > itself. We do not currently have an architecture which would allow an > > arbitrary number of phases without changing the flux file itself. What > you > > can do, though, in a stellar enrichment is stack enrichments (e.g. depend > > on previous enrichments) because it's just a list of stellar statements. > > The consequence, of course, is that these statements get run within the > > same worker, which is unfortunate, but may be a stopgap workaround. > > > > *1. https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8 > > > > On Mon, Jan 9, 2017 at 10:48 AM, Otto Fowler <[email protected]> > > wrote: > > > > > Maybe the naming of the phases is misleading? What if you could set up > > an > > > arbitrary number of stages, with defaults? > > > > > > > > > On January 8, 2017 at 16:25:01, Casey Stella ([email protected]) > wrote: > > > > > > You could do the geo enrichment normally and do a stellar hbase > > enrichment > > > in the threat Intel phase. > > > > > > On Sun, Jan 8, 2017 at 16:22 Ryan Merriman <[email protected]> > wrote: > > > > > > > Hbase enrichments and geo enrichments are done in parallel so I would > > > not > > > > expect this to work. You could do the Hbase enrichment as a threat > > Intel > > > > enrichment and that should work because enrichments and threat Intel > > are > > > > done in series. > > > > > > > > > > > > > > > > The ideal way would be to chain together Stellar enrichments but I > > don't > > > > think there is a geo enrichment function created yet. I think that > > > should > > > > be a Jira. I know someone is working on an update to how we do geo > > > > enrichments so I will file a follow on Jira if it's not included in > the > > > > scope of that work. > > > > > > > > > > > > > > > > Ryan > > > > > > > > > > > > > > > > > On Jan 8, 2017, at 2:31 PM, Dima Kovalyov <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > > Is it possible to enrich enrichment? > > > > > > > > > > > > > > > > > > For example I have IP address, I enrich it with geo and get City > > name, > > > > > > > > > now I want to enrich City name with city crime level (assume I have > > > that > > > > > > > > > data). But when I do that it just does not work. I specify > enrichment > > > > > > > > > like that: > > > > > > > > >> { > > > > > > > > >> "index" : "msexchange", > > > > > > > > >> "batchSize" : 5, > > > > > > > > >> "enrichment" : { > > > > > > > > >> "fieldMap" : { > > > > > > > > >> "geo" : [ "destination_ip", "source_ip" ], > > > > > > > > >> "hbaseEnrichment" : [ "enrichments.geo.destination_ip.country" ], > > > > > > > > >> "hbaseEnrichment" : [ "enrichments:geo:destination_ip:country" ], > > > > > > > > >> "hbaseEnrichment" : [ "enrichments.geo.destination_ip:country" ] > > > > > > > > >> }, > > > > > > > > >> "fieldToTypeMap" : { > > > > > > > > >> "enrichments.geo.destination_ip.country" : [ "city_crime_level" > ], > > > > > > > > >> "enrichments:geo:destination_ip:country" : [ "city_crime_level" > ], > > > > > > > > >> "enrichments.geo.destination_ip:country" : [ "city_crime_level" ] > > > > > > > > >> }, > > > > > > > > >> "config" : { } > > > > > > > > >> }, > > > > > > > > >> "threatIntel" : { > > > > > > > > >> "fieldMap" : { }, > > > > > > > > >> "fieldToTypeMap" : { }, > > > > > > > > >> "config" : { }, > > > > > > > > >> "triageConfig" : { > > > > > > > > >> "riskLevelRules" : { }, > > > > > > > > >> "aggregator" : "MAX", > > > > > > > > >> "aggregationConfig" : { } > > > > > > > > >> } > > > > > > > > >> }, > > > > > > > > >> "configuration" : { } > > > > > > > > >> } > > > > > > > > > I tried all the ways how enrichment field can be entered just to be > > > sure > > > > > > > > > I do not mistype it. > > > > > > > > > > > > > > > > > > - Dima > > > > > > > > > > > > > > > > > > > > -- > Nick Allen <[email protected]> >
