Agreed, reputation and confidence is not really encoded formally in the data model, but I would expect most people are using them to weight the results of the threat intel now we have threat triage scores built on stellar expressions.
There is definitely scope here to provide at least a recommended formal model for this, which may feed into some of the discussions about schema and traits elsewhere on the list (anyone remember back to the last time we talked about that?!) Otto, I’m not sure I see the problem with using NiFi as breaking an application boundary, or the necessity of everything being in storm. Ok, if brings in another component, but does also give us things like scheduling of web api polling for threat feeds. Most implementations of Metron I’ve been involved in usually have NiFi on the side anyway to get things into Kafka. I’d love to hear if people have a strong objection to bringing it into scope. What I was thinking was writing something like a MetronThreatIntelProcessor owned in metron, and published as a nar by the metron project. That would load NiFi flow file content direct into Metron’s HBase tables using the Metron loader code and config format. That would be combined with something like a StixProcessor (which I personally think should be a StixRecordReader in new NiFi btw) or a whatever parser, fetcher, tailer etc. Btw, I’ve also got early stage implementations of things like Stellar in NiFi which would be the starting point for building something like that. To address the bulk vs incremental side, we could use the same mechanism to handle both, but that would very much suggest moving to the record reader based apis. That should be fine at the O(100s gigabytes) scale in NiFi. Does anyone have any use cases that would still seem like they’d be in the terabytes / existing bulk map reduce approach end? Simon > On 19 Feb 2018, at 14:26, Otto Fowler <ottobackwa...@gmail.com> wrote: > > There are a couple of use cases here for getting the data. > > When you _can_ or want to ingest and duplicate the external store > > 1. Bulk Loading ( from a clean empty state ) > 2. Tailing the feed afterwards > > When you can’t > > 3. Calling the api ( most likely web ) for reputation or some other thing > > > Right now, I *think* we’d use our bulk loader for 1. I am not sure it can > be configured for 2. > NiFi *could* do it, if you wrote your Taxii client such that it was > stateful and could resume > after restarts etc and pickup from the right place. > > Right now, we only ingest indicators as raw data. I do not believe we > support the reputation and confidence stuff. > Also, the issue of which version of stix/taxii we support will need to be > considered. > > I think the idea of a ‘tailing’ topology per service where required would > be worth looking into, such a topology > would be transform and index (with a new hbase indexer ) only with no > enrichment. We also may want to explore indexing > enrichments to SEARCH stores or both SEACH and BATCH. > > Like Simon says, there is NiFi, but I would want to consider a metron > topology because this is a metron managed store, > and having nifi write to metron’s indicator store, or other threat store is > wrong I think. It breaks the application boundary . > > You should take a look at what jiras we currently have, and we can talk > about what what needs to happen, create the jiras > and get it rolling. > > I would imagine down the like, that we would support bulk load as we have > now ‘out of the box’. And have a new mpack > for optional threat intel flows available. > > ottO > > On February 19, 2018 at 07:47:39, Andre (andre-li...@fucs.org) wrote: > > Simon, > > I have coded but not merged a STIX / TAXII processor for NiFi that would > work perfectly fine with this. > > > But I will take the opportunity to touch the following points: > > > 1. Threat Intel is more frequently than not based on API lookups (e.g. > VirusTotal, RBLs and correlated, Umbrella's top million, etc). How are > those going to be consistently managed? > > 2. Threat feeds are frequently classified in regards to confidence but > today the default Metron schema seems to lack any similar concept? Do we > have plans to address it? > > 3. Atemporal matching - Given the use of big data technologies it seems to > me Metron should be able to look into past enrichment data in order to > classify traffic. I am not sure this is possible today? > > > Cheers > > > On Mon, Feb 19, 2018 at 8:48 PM, Simon Elliston Ball < > si...@simonellistonball.com> wrote: > >> Would it make sense to lean on something like Apache NiFi for this? It >> seems a good fit to handle getting data from wherever (web service, poll, >> push etc, streams etc). If we were to build a processor which > encapsulated >> the threat intel loader logic, that would provide a granular route to >> update threat intel entries in a more streaming manner. We could of > course >> do the same thing in code with storm topologies, but I would wonder > whether >> threat intel feeds would have enough volume to require that. >> >> Simon >> >>> On 16 Feb 2018, at 07:11, Ali Nazemian <alinazem...@gmail.com> wrote: >>> >>> I think one of the challenges is where the scope of threat intel ends >> from >>> the Metron roadmap? Does it gonna relly on supporting a standard format >> and >>> a loader to send it to HBase for the later threat intel use cases? >>> >>> In my opinion, it would be better to have a separate topology (sort of >>> similar to the profiler approach) to get the feeds (maybe from Kafka) > and >>> load it into HBase frequently based on what criteria we want to have. >> Maybe >>> we need to have some normalizations for the threat feeds (either >> aggregated >>> or single feed) as an example (or any other transformation by using >>> Stellar). Maybe we need to tailor row_key in a way that can be utilised >>> based on the threat intel look up we want to have further from the >>> enrichment topology. The problem I see with different loaders in Metron >> is >>> we can mostly use them for the purpose of POC, but if you want to build >> an >>> actual use case for a production platform then it will be out of the >>> flexibility of a loader, so we will end up feeding data to HBase based > on >>> our use case. >>> >>> In this case, maybe it won't be very important we want to use an >> aggregator >>> X or aggregator Y, we can integrate it with Metron based on integration >>> points. >>> >>> Cheers, >>> Ali >>> >>> On Wed, Feb 14, 2018 at 11:28 PM, Simon Elliston Ball < >>> si...@simonellistonball.com> wrote: >>> >>>> We used to install soltra edge in the old ansible builds (which have >>>> thankfully now been pared back in the interests of stability in full >> dev). >>>> Soltra has not been a good option since they went proprietary, so > since >>>> then we’ve included opentaxii (BSD 3) as a discovery and aggregator. >>>> >>>> Most of the challenges are around licensing. Hippocampe is part of The >>>> Hive Project, which is AGPL, which is an apache category X license so >> can’t >>>> be included. >>>> >>>> Mindmeld is much better license-wise (Apache 2) so would be well worth >>>> community consideration. I kinda like it as a framework, but >>>> >>>> I for one would be very pleased to hear a broader community discussion >>>> around which platforms we should have integrations with via the threat >>>> intel loader, or even through a direct to hbase streaming connector. >>>> >>>> Simon >>>> >>>>> On 14 Feb 2018, at 03:13, Ali Nazemian <alinazem...@gmail.com> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I would like to understand Metron community view on Threat Intel >>>>> aggregators as well as the roadmap of threat intelligence and threat >>>>> hunting. There are some open source options available regarding > threat >>>>> intel aggregator such as Minemeld, Hippocampe, etc. Is there any plan >> to >>>>> build that as a part of Metron in future? Is there any specific >>>> aggregator >>>>> you think would be more aligned with Metron roadmap? >>>>> >>>>> Cheers, >>>>> Ali >>>> >>>> >>> >>> >>> -- >>> A.Nazemian >> >>