Re: [Dailydave] AI

Anton Chuvakin Fri, 01 Apr 2016 06:57:11 -0700

.... but don't you guys [used generally] agree that ML and friends brings
up the challenge of "non-verifiability" to our domain  [that I whine about
here, if you are curious
http://blogs.gartner.com/anton-chuvakin/2015/03/03/killed-by-ai-much-a-rise-of-non-deterministic-security/].
Specifically, "because ML" argument is sometimes made not just by the
marketing droid [eh...I guess we can't use the word "droid" anymore,
because...hey...what if that marketing "person" is actually a narrow AI?],
but legitimately due to the algorithms/models pointing at a particular
outcome [like say "this binary is soooo bad"] without any explanation. So,
whether you are in ElJefe/GRR/MIG + free ML library camp or in
super-uber-hyper-expensive EDR product camp, the result is the same: the
system is telling me something and I don't understand why.....


On Wed, Mar 30, 2016 at 10:58 AM, Smoak, Christopher <
[email protected]> wrote:

> Sven,
>
> I definitely understand your point. Approaching the "when you have a
> hammer…" phenomenon is most certainly an issue in the machine learning
> field, especially to your .fit() point below. As sure as I am that such an
> issue exists, I also think there's room for, improperly phrased,
> "non-traditional" applications of these types of techniques in order to
> achieve some goal. I just don't want to make the blanket statement that "if
> it isn't an image, <insert technique>" won't work. I realize that's not
> necessarily your point, but I wanted to add some conversation fodder to
> what I consider to be a really interesting thread.
>
> Agreed 100% on the "because ML" argument; I see it way too often. Frankly,
> it hurts all "legitimate" (used liberally here) uses of ML in that
> everything gets wrapped up in the jargon/marketing lingo and can't see
> beyond it. We seem to live in an industry fraught with those types of
> things. My point is simply that I don't want to over-punish the terminology
> enough so as to devalue the real contributions that can be made to the
> field using ML, as an example. Employed carefully, there are definitely
> ways to use it for great justice. :)
>
> Anyway, just wanted to get some more thoughts going on this topic, as I
> think it's worth a longer discussion, albeit a slight digression.
>
> Regards,
>
> Chris Smoak
> Georgia Tech Research Institute
>
> From: Sven Krasser <[email protected]>
> Date: Wednesday, March 30, 2016 at 1:31 PM
> To: Christopher Smoak <[email protected]>, dave aitel <
> [email protected]>, "[email protected]" <
> [email protected]>
> Subject: Re: [Dailydave] AI
>
> Hey Chris,
>
> Carefully phrased, I am very skeptical that transforming your instances
> into images and then using CNNs will give you an out-of-the-box performance
> bump over other traditional techniques. To me this looks like a classic
> “When you have a hammer every problem looks like a nail” approach. Can we
> develop representations of input data that will allow deep architectures to
> successfully learn the instance space? Yes, I’m sure we can — but that will
> require more work than downloading TF and running it over the data as Dave
> described in his email.
>
> As far as technology in commercial products goes, my point is that
> primarily it is important that a product performs to a specific objective
> standard, regardless of the technologies used. Explaining why something
> performs is indeed important, but the answer to this cannot simply be
> “because Machine Learning” as we see presently (and which I assume prompted
> Dave to send his initial email). Everyone with rudimentary Python knowledge
> can go download sklearn right now and call .fit() on the Iris dataset.
> Congratulations, you just used Machine Learning. That doesn’t make for a
> compelling product, however.
>
> Best,
> -Sven
>
> --
> Sven Krasser, Ph.D.
> Chief Scientist, CrowdStrike, Inc.
> http://www.crowdstrike.com | http://tinyurl.com/cs-svenk
>
> From: "Smoak, Christopher" <[email protected]>
> Date: Wednesday, March 30, 2016 at 10:03 AM
> To: Sven Krasser <[email protected]>, dave aitel <[email protected]>,
> "[email protected]" <[email protected]>
> Subject: Re: [Dailydave] AI
>
> Sven,
>
> Your general point is well taken, however I'd contend that while most
> problems in security don't boil down to simple image classification tasks,
> there are certainly valid ways of using the unique spatial nature of CNNs
> to apply to security problems. Namely, mapping data that is not
> traditionally visual in nature to that of an image representing that data
> (e.g. binary -> png) can—and in my experience, has—yielded very promising
> results. Granted, it's debatable whether it's better to utilize a technique
> more suited to the original data set in lieu of transforming it into an
> image, but that's a conversation for another day. The bottom line is
> finding a model that consistently gives good results in context of the
> question being answered.
>
> On the point just caring about the results and not about the
> technology/process involved, I'm not sure I agree. When we get into
> extremely complex technologies that give us binary, "good/bad" answers to
> not-so-simple questions, I think it's imperative to understand the basis
> upon which the technology arrived at the answer. It may not be feasible
> with commercial (read: intellectual property) solutions but is nonetheless
> important. An example can be found in dynamic malware analysis systems,
> where understanding the perspective from which data is collected helps
> frame the efficacy of the result with respect to potential detection by
> malware.
>
> Just some food for thought.
>
> Chris Smoak
> Georgia Tech Research Institute
>
> From: <[email protected]> on behalf of Sven Krasser
> <[email protected]>
> Date: Wednesday, March 30, 2016 at 10:49 AM
> To: dave aitel <[email protected]>, "[email protected]" <
> [email protected]>
> Subject: Re: [Dailydave] AI
>
> Hey Dave,
>
> You got some things right and some things wrong. In security, most
> problems are not image classification related and do not benefit at the
> same level from the recent advances in Convolutional Neural Networks. Also,
> TensorFlow is not the first freely available Deep Learning library nor is
> it the first freely available Machine Learning classification library by a
> long shot. Take a look at e.g. some of the presentations that the MLSec
> Project made available, ML has been in security products for decades (and I
> worked on shipping products with it back in the day working at CipherTrust
> before people cared what technology stopped the threats as long as they
> were stopped). What’s new is that Machine Learning now also appears on
> marketing materials. So the question one should ask oneself is whether you
> still have a product once the ML hype wore off.
>
> Best,
> -Sven
>
> --
> Sven Krasser, Ph.D.
> Chief Scientist, CrowdStrike, Inc.
> http://www.crowdstrike.com | http://tinyurl.com/cs-svenk
>
> From: <[email protected]> on behalf of dave aitel <
> [email protected]>
> Date: Wednesday, March 30, 2016 at 5:56 AM
> To: "[email protected]" <[email protected]>
> Subject: [Dailydave] AI
>
> There are only a few real computers in the world, and I think we are just
> beginning to feel their influence. For example, here is a sample project I
> am working on now that image classification is a solved problem.
>
> Like many of you on this list, I dabble in brazilian jiu jitsu. In fact,
> in a week we are doing an open mat at INFILTRATE for both newcomers who've
> always wanted to try to choke me out, to people in the community who are
> already very good at choking people.
>
> Like many sports, BJJ is typically scored according to a ruleset based on
> the different positions you end up in. Being on top is usually better.
> Being able to get on top after you are on the bottom is worth 2 points.
> Being able to completely mount someone is worth three points. Getting on
> their back is four points. Generally a tournament will hire judges and they
> will award points based on their understanding of the rules and their
> personal feelings towards the contestants and whatever other factors are
> floating in their heads.
>
> What I'm working on is collecting a set of images of BJJ, then annotating
> them as to what positions the different people are in. This essentially
> maps every image into a vector space - and after training a neural network
> using modern techniques you can have a program that looks at an image and
> then outputs "Blue is in top mount".
>
> Part of the key here is that you don't have to tell it that the picture is
> BJJ. Every picture that program sees is two people doing BJJ. All it has to
> do is output what positions they are in.
>
> And in the end, by assigning point values to transitions between
> positions, you will have an automatic BJJ judge. I've applied for a
> TensorFlow API key from Google since although this is not a hard problem by
> ML standards I want to do it the right way and get good scalable results on
> video later.
>
> And of course, the same thing is true for the process information El Jefe
> <https://eljefe.immunityinc.com/> will give you. All those "behavioral
> analysis machine learning intrusion detection" startups are about to be
> crushed by simple open source projects that use Google and MS and Amazon's
> exported Machine Learning APIs.
>
> -dave
>
>
>
> _______________________________________________
> Dailydave mailing list
> [email protected]
> https://lists.immunityinc.com/mailman/listinfo/dailydave
>
>


-- 
Dr. Anton Chuvakin
Site: http://www.chuvakin.org
Twitter: @anton_chuvakin <https://twitter.com/anton_chuvakin>
Work: http://www.linkedin.com/in/chuvakin

_______________________________________________
Dailydave mailing list
[email protected]
https://lists.immunityinc.com/mailman/listinfo/dailydave

Re: [Dailydave] AI

Reply via email to