.... but don't you guys [used generally] agree that ML and friends brings up the challenge of "non-verifiability" to our domain [that I whine about here, if you are curious http://blogs.gartner.com/anton-chuvakin/2015/03/03/killed-by-ai-much-a-rise-of-non-deterministic-security/]. Specifically, "because ML" argument is sometimes made not just by the marketing droid [eh...I guess we can't use the word "droid" anymore, because...hey...what if that marketing "person" is actually a narrow AI?], but legitimately due to the algorithms/models pointing at a particular outcome [like say "this binary is soooo bad"] without any explanation. So, whether you are in ElJefe/GRR/MIG + free ML library camp or in super-uber-hyper-expensive EDR product camp, the result is the same: the system is telling me something and I don't understand why.....
On Wed, Mar 30, 2016 at 10:58 AM, Smoak, Christopher < [email protected]> wrote: > Sven, > > I definitely understand your point. Approaching the "when you have a > hammer…" phenomenon is most certainly an issue in the machine learning > field, especially to your .fit() point below. As sure as I am that such an > issue exists, I also think there's room for, improperly phrased, > "non-traditional" applications of these types of techniques in order to > achieve some goal. I just don't want to make the blanket statement that "if > it isn't an image, <insert technique>" won't work. I realize that's not > necessarily your point, but I wanted to add some conversation fodder to > what I consider to be a really interesting thread. > > Agreed 100% on the "because ML" argument; I see it way too often. Frankly, > it hurts all "legitimate" (used liberally here) uses of ML in that > everything gets wrapped up in the jargon/marketing lingo and can't see > beyond it. We seem to live in an industry fraught with those types of > things. My point is simply that I don't want to over-punish the terminology > enough so as to devalue the real contributions that can be made to the > field using ML, as an example. Employed carefully, there are definitely > ways to use it for great justice. :) > > Anyway, just wanted to get some more thoughts going on this topic, as I > think it's worth a longer discussion, albeit a slight digression. > > Regards, > > Chris Smoak > Georgia Tech Research Institute > > From: Sven Krasser <[email protected]> > Date: Wednesday, March 30, 2016 at 1:31 PM > To: Christopher Smoak <[email protected]>, dave aitel < > [email protected]>, "[email protected]" < > [email protected]> > Subject: Re: [Dailydave] AI > > Hey Chris, > > Carefully phrased, I am very skeptical that transforming your instances > into images and then using CNNs will give you an out-of-the-box performance > bump over other traditional techniques. To me this looks like a classic > “When you have a hammer every problem looks like a nail” approach. Can we > develop representations of input data that will allow deep architectures to > successfully learn the instance space? Yes, I’m sure we can — but that will > require more work than downloading TF and running it over the data as Dave > described in his email. > > As far as technology in commercial products goes, my point is that > primarily it is important that a product performs to a specific objective > standard, regardless of the technologies used. Explaining why something > performs is indeed important, but the answer to this cannot simply be > “because Machine Learning” as we see presently (and which I assume prompted > Dave to send his initial email). Everyone with rudimentary Python knowledge > can go download sklearn right now and call .fit() on the Iris dataset. > Congratulations, you just used Machine Learning. That doesn’t make for a > compelling product, however. > > Best, > -Sven > > -- > Sven Krasser, Ph.D. > Chief Scientist, CrowdStrike, Inc. > http://www.crowdstrike.com | http://tinyurl.com/cs-svenk > > From: "Smoak, Christopher" <[email protected]> > Date: Wednesday, March 30, 2016 at 10:03 AM > To: Sven Krasser <[email protected]>, dave aitel <[email protected]>, > "[email protected]" <[email protected]> > Subject: Re: [Dailydave] AI > > Sven, > > Your general point is well taken, however I'd contend that while most > problems in security don't boil down to simple image classification tasks, > there are certainly valid ways of using the unique spatial nature of CNNs > to apply to security problems. Namely, mapping data that is not > traditionally visual in nature to that of an image representing that data > (e.g. binary -> png) can—and in my experience, has—yielded very promising > results. Granted, it's debatable whether it's better to utilize a technique > more suited to the original data set in lieu of transforming it into an > image, but that's a conversation for another day. The bottom line is > finding a model that consistently gives good results in context of the > question being answered. > > On the point just caring about the results and not about the > technology/process involved, I'm not sure I agree. When we get into > extremely complex technologies that give us binary, "good/bad" answers to > not-so-simple questions, I think it's imperative to understand the basis > upon which the technology arrived at the answer. It may not be feasible > with commercial (read: intellectual property) solutions but is nonetheless > important. An example can be found in dynamic malware analysis systems, > where understanding the perspective from which data is collected helps > frame the efficacy of the result with respect to potential detection by > malware. > > Just some food for thought. > > Chris Smoak > Georgia Tech Research Institute > > From: <[email protected]> on behalf of Sven Krasser > <[email protected]> > Date: Wednesday, March 30, 2016 at 10:49 AM > To: dave aitel <[email protected]>, "[email protected]" < > [email protected]> > Subject: Re: [Dailydave] AI > > Hey Dave, > > You got some things right and some things wrong. In security, most > problems are not image classification related and do not benefit at the > same level from the recent advances in Convolutional Neural Networks. Also, > TensorFlow is not the first freely available Deep Learning library nor is > it the first freely available Machine Learning classification library by a > long shot. Take a look at e.g. some of the presentations that the MLSec > Project made available, ML has been in security products for decades (and I > worked on shipping products with it back in the day working at CipherTrust > before people cared what technology stopped the threats as long as they > were stopped). What’s new is that Machine Learning now also appears on > marketing materials. So the question one should ask oneself is whether you > still have a product once the ML hype wore off. > > Best, > -Sven > > -- > Sven Krasser, Ph.D. > Chief Scientist, CrowdStrike, Inc. > http://www.crowdstrike.com | http://tinyurl.com/cs-svenk > > From: <[email protected]> on behalf of dave aitel < > [email protected]> > Date: Wednesday, March 30, 2016 at 5:56 AM > To: "[email protected]" <[email protected]> > Subject: [Dailydave] AI > > There are only a few real computers in the world, and I think we are just > beginning to feel their influence. For example, here is a sample project I > am working on now that image classification is a solved problem. > > Like many of you on this list, I dabble in brazilian jiu jitsu. In fact, > in a week we are doing an open mat at INFILTRATE for both newcomers who've > always wanted to try to choke me out, to people in the community who are > already very good at choking people. > > Like many sports, BJJ is typically scored according to a ruleset based on > the different positions you end up in. Being on top is usually better. > Being able to get on top after you are on the bottom is worth 2 points. > Being able to completely mount someone is worth three points. Getting on > their back is four points. Generally a tournament will hire judges and they > will award points based on their understanding of the rules and their > personal feelings towards the contestants and whatever other factors are > floating in their heads. > > What I'm working on is collecting a set of images of BJJ, then annotating > them as to what positions the different people are in. This essentially > maps every image into a vector space - and after training a neural network > using modern techniques you can have a program that looks at an image and > then outputs "Blue is in top mount". > > Part of the key here is that you don't have to tell it that the picture is > BJJ. Every picture that program sees is two people doing BJJ. All it has to > do is output what positions they are in. > > And in the end, by assigning point values to transitions between > positions, you will have an automatic BJJ judge. I've applied for a > TensorFlow API key from Google since although this is not a hard problem by > ML standards I want to do it the right way and get good scalable results on > video later. > > And of course, the same thing is true for the process information El Jefe > <https://eljefe.immunityinc.com/> will give you. All those "behavioral > analysis machine learning intrusion detection" startups are about to be > crushed by simple open source projects that use Google and MS and Amazon's > exported Machine Learning APIs. > > -dave > > > > _______________________________________________ > Dailydave mailing list > [email protected] > https://lists.immunityinc.com/mailman/listinfo/dailydave > > -- Dr. Anton Chuvakin Site: http://www.chuvakin.org Twitter: @anton_chuvakin <https://twitter.com/anton_chuvakin> Work: http://www.linkedin.com/in/chuvakin
_______________________________________________ Dailydave mailing list [email protected] https://lists.immunityinc.com/mailman/listinfo/dailydave
