To be fair, the advantage of the network position is it avoids interference with your host-protection programs (aka, implants). And evading on the host is possible too. But both are probably necessary at some level.
On Wed, Jun 21, 2017 at 1:41 PM Dominique Brezinski < [email protected]> wrote: > Let me tell a little story about statistical analysis of network traffic. > I may or may not have been associated with someone that built a very > large-scale, statistics-based detection mechanism using un-sampled network > flow and HTTP proxy logs. 3200 cores chugged through the trailing X weeks > of traffic, for hundreds of thousands of hosts, building usage profiles and > then measured the distance of the current day's activity for each host from > the baseline profile. > > As a digression, in this unsupervised learning space all hope depends on > quality feature selection. Once quality features are selected/engineered, > the most basic of distance measures is sufficient to detect anomalies. The > detected anomalies are just that -- anomalies. With regard to threat > detection, the false positive and false negative rates are still likely too > high to operationalize. You can ML like a boss and use Symbolic Aggregate > Approximation (SAX) to represent your logs as images, then use a > Convolutional Neural Network (CNN) to do the feature extraction for you, > and feed the results through a Long Short Term Memory (LSTM) Recurrent > Neural Network (RNN) to detect the anomalies -- which is approximately what > Niara does based on their Spark Summit presentation. Or you can ML like a > security engineer and use domain knowledge to identify discriminating > features and use some simple Euclidean distance measures to detect > anomalies. I have done both with the same approximate results. That was a > statistics joke. > > The result of all this statistical analysis is a set of finding about > hosts that deviate from normal by some measure on one or more features. So > what? Well that is exactly what the team responsible for triaging and > operationalizing alerts said. This is where the real work begins. Now if > the host communicated with novel domains among the population, for example, > the domains would be provided as evidence. The domain information could be > enriched with threat intel and results from services like OpenDNS. The > monitoring team still says, "yeah ok, it talked to some sketchy shit. What > are we really suppose to do about that? I mean really do, so we are not > scaling a very expensive whack-a-mole team?" Right. > > Now we go pull all the process execution and process-to-network events > from the hosts. Now when a network anomaly occurs, you essentially build > the activity graph that resulted in the anomalous network traffic. This > looks actionable. It is. > > The thing is, once you have that on-host activity, as Dave said some might > say, you really don't need the network data anymore. You get to the same > result earlier in the activity chain with actionable results, rich in > context that is easily assessed by analysts and incident responders. Even > better, you don't need to use statistics. There are better models using > this data that are quite good for detection and hunting. > > Some of us like belts and suspenders when we have to depend on imperfect > techniques to mitigate risk, so network-level instrumentation presents data > from a plane with different attack surface that correlates with host data. > That is a nice feature if you can take advantage of it. Network data is > also OS/device independent. Building some anomaly detection on network data > provides broad coverage at a low engineering cost, however, the compute and > storage costs are usually quite high. There are a lot of trade-offs. > Honestly, most people get lost and never get clarity about what and how > they are trying to detect and whether the data and techniques align with > their desired results. They take an opportunistic stab at what data they > have and fall down the rabbit hole. > > Dom > > On Wed, Jun 21, 2017 at 7:25 AM, dave aitel <[email protected]> wrote: > >> Let's talk about the giant pile of wrong that is this reporting on >> Cisco's new marketing campaign >> <http://www.cnbc.com/2017/06/20/cisco-introduces-encrypted-traffic-analytics-to-detect-malwre.html> >> around detecting encrypted malware traffic. "This is a seminal moment in >> networking" is the quote from their CEO that CNBC decided to run. Let's >> revisit the basics of this "new" technology: do statistical analysis on >> encrypted data to find malware traffic. >> >> People have literally decoded conversations >> <https://www.schneier.com/blog/archives/2008/06/eavesdropping_o_2.html> >> from encrypted data using that same basic technique. Not even recently - >> that work is from 2008 and was not surprising even then. >> >> "The software, which will be offered as a subscription service, is >> currently in field trials with 75 customers, and according to Robbins, is >> 99 percent effective." >> >> 99% effective with the kind of traffic a normal network sees means you >> are FLOODED AND OVERWHELMED WITH FALSE POSITIVES. Although they don't >> specify what that number even means. Is it false positives? False >> negatives? Both? Let's just say this: 99.99% is useless when doing a >> network-based IDS. All that might get you is an indicator you can use to >> remotely load a more sophisticated remote tool onto an endpoint for further >> detailed analysis. You essentially, need BOTH if you have this level of >> network-based IDS, and the endpoint people will probably say you don't need >> the network sniffer anymore, because scaling good analysis at that level at >> anything near realtime is nearly impossible (c.f. Alex Stamos's talk >> <https://www.youtube.com/watch?v=2OTRU--HtLM>) to the point where they >> still try to sell you stuff that has 1% false positive rates. :) >> >> I'm going to bug our big customers to see if any of them are in this 75 >> field trial and what they think in real life. And I'm going to be honest >> and say that if you are thinking of investing in this sort of thing, but >> you haven't tested it against Cobalt Strike >> <https://www.cobaltstrike.com/> and INNUENDO >> <https://www.immunityinc.com/products/innuendo/>, then you are knowingly >> buying snake oil. A good percentage of our consulting business right now is >> literally just that because these anomaly detection products are so >> expensive and so hard to test. >> >> Anyways, maybe I am wrong! If you are one of the privileged 75 and you >> love this and it is amazing, let me/us know! >> >> -dave >> >> >> >> >> _______________________________________________ >> Dailydave mailing list >> [email protected] >> https://lists.immunityinc.com/mailman/listinfo/dailydave >> >> > _______________________________________________ > Dailydave mailing list > [email protected] > https://lists.immunityinc.com/mailman/listinfo/dailydave >
_______________________________________________ Dailydave mailing list [email protected] https://lists.immunityinc.com/mailman/listinfo/dailydave
