Adding new topologies adds more processing requirements to the system. It adds more topics (storage) and more producers and consumers to kafka (processing).
I think what we need is a dependency of enrichments. Maybe we need to either derive the dependencies using the Stellar (potentially not that easy) or allow the enrichment to specify the order or enrichment calculations. This will allow users to calculate more enrichment in the same topology. Thanks Carolyn Sent from my Verizon, Samsung Galaxy smartphone -------- Original message -------- From: Nick Allen <[email protected]> Date: 1/9/17 8:49 AM (GMT-08:00) To: [email protected] Subject: Re: Enrich enrichment I agree that making it easy for the user to "enrich enrichments", as Dima put it, to an arbitrary depth, would be extremely useful for a lot of use cases. We've discussed the use case a little in the past in this thread [1]. Re-purposing the "threat intel" phase gives us something that is feasible today, but only to a "depth" of 2. We would also need to rename and redocument it so that users understand how they can leverage the two phases. This seems like a minimally viable option if we want to head down this road. The other extreme might involve inferring the topology needed based on the user's configuration. If the user needs 3 phases, then we build a topology that supports 3 phases. Under the covers instead of using Flux, we would use Storm's topology builder Java API to grok the configuration and build the topology(ies) that the user needs. I am not sure if we can infer this from the configuration as it exists today or if we would need to redefine the configuration somehow. Like I said this is "extreme", but could give the user more expressive and intuitive options. --- [1] http://mail-archives.apache.org/mod_mbox/incubator-metron-dev/201610.mbox/%3CCAHSJ8NwJUiyp3YO6NVE4tfLoSSkOc6QG%2BMsAJSSDu%2B-wfct_vw%40mail.gmail.com%3E On Mon, Jan 9, 2017 at 10:56 AM, Casey Stella <[email protected]> wrote: > I think that would be a good feature to add to have arbitrary number of > phases, though it might be tricky to code (the way I envisioned it would > involve a loop in storm, which is possible[1]), might have unintended > consequences to guarantees (e.g. updating enrichments might not be able to > be applied in realtime) and could be tricky to reason about > performance-wise. > > As it stands, the number of phases is a consequence of the topology > itself. We do not currently have an architecture which would allow an > arbitrary number of phases without changing the flux file itself. What you > can do, though, in a stellar enrichment is stack enrichments (e.g. depend > on previous enrichments) because it's just a list of stellar statements. > The consequence, of course, is that these statements get run within the > same worker, which is unfortunate, but may be a stopgap workaround. > > *1. https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8 > > On Mon, Jan 9, 2017 at 10:48 AM, Otto Fowler <[email protected]> > wrote: > > > Maybe the naming of the phases is misleading? What if you could set up > an > > arbitrary number of stages, with defaults? > > > > > > On January 8, 2017 at 16:25:01, Casey Stella ([email protected]) wrote: > > > > You could do the geo enrichment normally and do a stellar hbase > enrichment > > in the threat Intel phase. > > > > On Sun, Jan 8, 2017 at 16:22 Ryan Merriman <[email protected]> wrote: > > > > > Hbase enrichments and geo enrichments are done in parallel so I would > > not > > > expect this to work. You could do the Hbase enrichment as a threat > Intel > > > enrichment and that should work because enrichments and threat Intel > are > > > done in series. > > > > > > > > > > > > The ideal way would be to chain together Stellar enrichments but I > don't > > > think there is a geo enrichment function created yet. I think that > > should > > > be a Jira. I know someone is working on an update to how we do geo > > > enrichments so I will file a follow on Jira if it's not included in the > > > scope of that work. > > > > > > > > > > > > Ryan > > > > > > > > > > > > > On Jan 8, 2017, at 2:31 PM, Dima Kovalyov <[email protected]> > > > wrote: > > > > > > > > > > > > > > Is it possible to enrich enrichment? > > > > > > > > > > > > > > For example I have IP address, I enrich it with geo and get City > name, > > > > > > > now I want to enrich City name with city crime level (assume I have > > that > > > > > > > data). But when I do that it just does not work. I specify enrichment > > > > > > > like that: > > > > > > >> { > > > > > > >> "index" : "msexchange", > > > > > > >> "batchSize" : 5, > > > > > > >> "enrichment" : { > > > > > > >> "fieldMap" : { > > > > > > >> "geo" : [ "destination_ip", "source_ip" ], > > > > > > >> "hbaseEnrichment" : [ "enrichments.geo.destination_ip.country" ], > > > > > > >> "hbaseEnrichment" : [ "enrichments:geo:destination_ip:country" ], > > > > > > >> "hbaseEnrichment" : [ "enrichments.geo.destination_ip:country" ] > > > > > > >> }, > > > > > > >> "fieldToTypeMap" : { > > > > > > >> "enrichments.geo.destination_ip.country" : [ "city_crime_level" ], > > > > > > >> "enrichments:geo:destination_ip:country" : [ "city_crime_level" ], > > > > > > >> "enrichments.geo.destination_ip:country" : [ "city_crime_level" ] > > > > > > >> }, > > > > > > >> "config" : { } > > > > > > >> }, > > > > > > >> "threatIntel" : { > > > > > > >> "fieldMap" : { }, > > > > > > >> "fieldToTypeMap" : { }, > > > > > > >> "config" : { }, > > > > > > >> "triageConfig" : { > > > > > > >> "riskLevelRules" : { }, > > > > > > >> "aggregator" : "MAX", > > > > > > >> "aggregationConfig" : { } > > > > > > >> } > > > > > > >> }, > > > > > > >> "configuration" : { } > > > > > > >> } > > > > > > > I tried all the ways how enrichment field can be entered just to be > > sure > > > > > > > I do not mistype it. > > > > > > > > > > > > > > - Dima > > > > > > > > > > > -- Nick Allen <[email protected]>
