Sven,

I definitely understand your point. Approaching the "when you have a hammer…" 
phenomenon is most certainly an issue in the machine learning field, especially 
to your .fit() point below. As sure as I am that such an issue exists, I also 
think there's room for, improperly phrased, "non-traditional" applications of 
these types of techniques in order to achieve some goal. I just don't want to 
make the blanket statement that "if it isn't an image, <insert technique>" 
won't work. I realize that's not necessarily your point, but I wanted to add 
some conversation fodder to what I consider to be a really interesting thread.

Agreed 100% on the "because ML" argument; I see it way too often. Frankly, it 
hurts all "legitimate" (used liberally here) uses of ML in that everything gets 
wrapped up in the jargon/marketing lingo and can't see beyond it. We seem to 
live in an industry fraught with those types of things. My point is simply that 
I don't want to over-punish the terminology enough so as to devalue the real 
contributions that can be made to the field using ML, as an example. Employed 
carefully, there are definitely ways to use it for great justice. :)

Anyway, just wanted to get some more thoughts going on this topic, as I think 
it's worth a longer discussion, albeit a slight digression.

Regards,

Chris Smoak
Georgia Tech Research Institute

From: Sven Krasser <[email protected]<mailto:[email protected]>>
Date: Wednesday, March 30, 2016 at 1:31 PM
To: Christopher Smoak 
<[email protected]<mailto:[email protected]>>, 
dave aitel <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Dailydave] AI

Hey Chris,

Carefully phrased, I am very skeptical that transforming your instances into 
images and then using CNNs will give you an out-of-the-box performance bump 
over other traditional techniques. To me this looks like a classic “When you 
have a hammer every problem looks like a nail” approach. Can we develop 
representations of input data that will allow deep architectures to 
successfully learn the instance space? Yes, I’m sure we can — but that will 
require more work than downloading TF and running it over the data as Dave 
described in his email.

As far as technology in commercial products goes, my point is that primarily it 
is important that a product performs to a specific objective standard, 
regardless of the technologies used. Explaining why something performs is 
indeed important, but the answer to this cannot simply be “because Machine 
Learning” as we see presently (and which I assume prompted Dave to send his 
initial email). Everyone with rudimentary Python knowledge can go download 
sklearn right now and call .fit() on the Iris dataset. Congratulations, you 
just used Machine Learning. That doesn’t make for a compelling product, however.

Best,
-Sven

--
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: "Smoak, Christopher" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, March 30, 2016 at 10:03 AM
To: Sven Krasser <[email protected]<mailto:[email protected]>>, dave 
aitel <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Dailydave] AI

Sven,

Your general point is well taken, however I'd contend that while most problems 
in security don't boil down to simple image classification tasks, there are 
certainly valid ways of using the unique spatial nature of CNNs to apply to 
security problems. Namely, mapping data that is not traditionally visual in 
nature to that of an image representing that data (e.g. binary -> png) can—and 
in my experience, has—yielded very promising results. Granted, it's debatable 
whether it's better to utilize a technique more suited to the original data set 
in lieu of transforming it into an image, but that's a conversation for another 
day. The bottom line is finding a model that consistently gives good results in 
context of the question being answered.

On the point just caring about the results and not about the technology/process 
involved, I'm not sure I agree. When we get into extremely complex technologies 
that give us binary, "good/bad" answers to not-so-simple questions, I think 
it's imperative to understand the basis upon which the technology arrived at 
the answer. It may not be feasible with commercial (read: intellectual 
property) solutions but is nonetheless important. An example can be found in 
dynamic malware analysis systems, where understanding the perspective from 
which data is collected helps frame the efficacy of the result with respect to 
potential detection by malware.

Just some food for thought.

Chris Smoak
Georgia Tech Research Institute

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Sven Krasser <[email protected]<mailto:[email protected]>>
Date: Wednesday, March 30, 2016 at 10:49 AM
To: dave aitel <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Dailydave] AI

Hey Dave,

You got some things right and some things wrong. In security, most problems are 
not image classification related and do not benefit at the same level from the 
recent advances in Convolutional Neural Networks. Also, TensorFlow is not the 
first freely available Deep Learning library nor is it the first freely 
available Machine Learning classification library by a long shot. Take a look 
at e.g. some of the presentations that the MLSec Project made available, ML has 
been in security products for decades (and I worked on shipping products with 
it back in the day working at CipherTrust before people cared what technology 
stopped the threats as long as they were stopped). What’s new is that Machine 
Learning now also appears on marketing materials. So the question one should 
ask oneself is whether you still have a product once the ML hype wore off.

Best,
-Sven

--
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of dave aitel <[email protected]<mailto:[email protected]>>
Date: Wednesday, March 30, 2016 at 5:56 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Dailydave] AI

There are only a few real computers in the world, and I think we are just 
beginning to feel their influence. For example, here is a sample project I am 
working on now that image classification is a solved problem.

Like many of you on this list, I dabble in brazilian jiu jitsu. In fact, in a 
week we are doing an open mat at INFILTRATE for both newcomers who've always 
wanted to try to choke me out, to people in the community who are already very 
good at choking people.

Like many sports, BJJ is typically scored according to a ruleset based on the 
different positions you end up in. Being on top is usually better. Being able 
to get on top after you are on the bottom is worth 2 points. Being able to 
completely mount someone is worth three points. Getting on their back is four 
points. Generally a tournament will hire judges and they will award points 
based on their understanding of the rules and their personal feelings towards 
the contestants and whatever other factors are floating in their heads.

What I'm working on is collecting a set of images of BJJ, then annotating them 
as to what positions the different people are in. This essentially maps every 
image into a vector space - and after training a neural network using modern 
techniques you can have a program that looks at an image and then outputs "Blue 
is in top mount".

Part of the key here is that you don't have to tell it that the picture is BJJ. 
Every picture that program sees is two people doing BJJ. All it has to do is 
output what positions they are in.

And in the end, by assigning point values to transitions between positions, you 
will have an automatic BJJ judge. I've applied for a TensorFlow API key from 
Google since although this is not a hard problem by ML standards I want to do 
it the right way and get good scalable results on video later.

And of course, the same thing is true for the process information El 
Jefe<https://eljefe.immunityinc.com/> will give you. All those "behavioral 
analysis machine learning intrusion detection" startups are about to be crushed 
by simple open source projects that use Google and MS and Amazon's exported 
Machine Learning APIs.

-dave


_______________________________________________
Dailydave mailing list
[email protected]
https://lists.immunityinc.com/mailman/listinfo/dailydave

Reply via email to