Re: [langsec-discuss] Studying malware in terms of LangSec

travis+ml-langsec Sun, 30 Nov 2014 18:49:35 -0800

On Wed, Nov 26, 2014 at 02:39:25PM +0530, Sashank Dara wrote:
> But what are the theoretical roots ?
> Can we model the variations in the code that exhibit the same behavior ?
> 
> (Am not able to articulate it more formally , let me give a try)
> Say how to model two different strings of same language exhibiting same
> behavior ?


People researching "ROP gadgets" and how to construct programs out of
them are doing significant research into modelling the behavior of
weird machines made out of somewhat random bits of code.  I believe
that there are researchers with automated systems to create exploits
with.  I heard there were back in 2008 or so I assume it is much more
mature now.  So they do symbolic analysis.

> Can we model run time behavior of  a program in Computation theory at all ?

I suppose the question is, what is the behavior and polymorphism are
you interested in?

Replacing "add one" with "subtract negative one" is pretty easy to
detect.  That's a clearly equivalent machine.  Automated trivial
polymorphism here:
http://www.crazyboy.com/hydan/

You might make progress with that kind of analysis with this:
http://bitblaze.cs.berkeley.edu/

But what if I add an extra system call to sleep?

Most of what malware payload is interested in is side-effects, like
snooping on the keyboard and sending it out over the network.  It's
not the kind of computation that academics typically talk about.  If
the polymorphism you're trying to detect involves changes to system
calls, you'll need some kind of model of their semantics to detect
that it used to send a buffer in one syscall but now it sends it in
two.  You might be able to do something interesting with detecting
"bad" things and exfiltration of data with static analysis.  However,
things like games actually scan the raw keyboard, and clever malware
is doing its keyboard snooping in the exact same way to avoid
detection.

On top of that, most malware is probably going to be using some kind
of "packer", so you need to emulate the unpacking long enough to get
the actual instructions it will execute.  That behavior might be
detectable.  Maybe that's what you're referring to.  Sophisticated
malware is detecting this emulation and not unpacking, or waiting a
long period of time before unpacking itself.

However, beware that you can do a LOT of anti-RE stuff:
http://www.recon.cx/en/f/vskype-part1.pdf
http://www.recon.cx/en/f/vskype-part2.pdf

While it may not be possible to analyze such software, it may be
possible to separate software trying to do tricky things like
self-modifying code from software being open and honest.  But
there will likely be very common classes of "bad"  behavior that
are widely used, like patching the GOT (global offset table),
and patching up DLL call pointers on the first call, and you'd
have to write detectors for that and avoid blacklisting the
software for that alone.
-- 
http://www.subspacefield.org/~travis/
Split a packed field and I am there; parse a line of text and you will find me.

pgpZy7eWJcYRO.pgp
Description: PGP signature

_______________________________________________
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

Re: [langsec-discuss] Studying malware in terms of LangSec

Reply via email to