[langsec-discuss] Fwd: [BAHA] tracking the source of data

travis+ml-langsec Mon, 28 Oct 2013 17:51:08 -0700

Hi there,

I'm not sure if this is precisely the point of this list, but I
thought the readers of this list might find the attached message with
my thoughts on data source tracking interesting (originally sent to
the Bay Area Hackers Association mlist I run).


Also, I noticed langsec.org has no home.  If you are interested in
hosting on my server, I run several mlists and websites off there, so
the incremental cost to me is zero, and the content fits in rather
well with my interests.
-- 
http://www.subspacefield.org/~travis/
Becoming good at war often involves becoming bad at peace.
-- PopSci article on PTSD

--- Begin Message ---

Today I had an idea when analyzing some source code.  One problem is
knowing whether data is adversary-controlled or not; most of the
security issues revolve around this.

I think it was an engineer at Twitter that said (in a presentation)
they were considering HTML-encoding all adversary-controlled data.
That way, if they use it in HTML, they are automagically protected
from XSS, and if they need the original data, they HTML-decode it, and
the analyst knows something dangerous is happening.  Now that's not a
good idea, since there's 5 or so potential encoding techniques
depending on the HTML context (tag-body, attribute, javascript,
etc...) but I thought the idea of making adversary-controlled data
opaque makes some sense.

What if you got it in encrypted form, and you had to explicitly
decrypt (via a framework call) it to use it?  Or it could be referred
to only by a handle (like a file descriptor) to avoid encryption.  Any
call to the handle dereference would indicate use of tainted data.

The end goal is to have a better way of analyzing source code... to
make it easy to find where tainted data is used
(dereferencing/decrypting) and where the control flow is affected by
it.

PERL had a tainted flag on data which could be checked and manipulated
at run-time.  More generally, most of our languages have a type
system, and the type of the data isn't part of the data per se, so why
not track the source of data somehow?  It might not be simple but the
type system is a very similar structure.

*Aside: this idea is remarkably similar to the legal view of
copyrighted data and its derivatives; see:
http://ansuz.sooke.bc.ca/lawpoli/colour/2004061001.php

With such a system, we could do even more interesting defense things,
for example.  To solve the mixing of control and data, we could say
"don't allow SQL statements to contain adversary-controlled data" (use
parameterized queries instead) or "don't allow HTML to contain
adversary-controlled data" (use HTML templating of some kind, or
mandate HTML encoding and hope they get it right).  In these cases,
you can't "untaint" something by doing some magic universal encoding
on it; it depends on the context of its use, and this could
potentially be done in some less error-prone way than manually.  In
fact, other OWASP Top 10 which could be mitigated this way include
injection, insecure direct object references, unvalidated redirects
and forwards, and sensitive data exposure (where we mark private data
as having a sensitive source and prevent it from being leaked to the
adversary).  That's up to five of the OWASP top 10.

For encryption, we have other interesting options as well.  BEAST
attack could be mitigated by not mixing our secret data and
adversary-controlled data in the same encrypted stream under the same
key.  You see, encryption is very sensitive to manipulation; encrypted
diplomatic cables were careful to paraphrase important speeches
because otherwise it gave sigint adversaries known plaintext.  It
makes perfect sense to have different encryption contexts for data
from different sources, and that could be automatically managed for
you.
-- 
http://www.subspacefield.org/~travis/
It's so hard being a single mom these days; especially when you're sixteen,
childless, and male.  It's like the deck is completely stacked against you.

pgpUkZLWjAZ1v.pgp
Description: PGP signature

_______________________________________________
BAHA mailing list
[email protected]
http://lists.bitrot.info/mailman/listinfo/baha

--- End Message ---

pgpte2rYX9j6U.pgp
Description: PGP signature

_______________________________________________
langsec-discuss mailing list
[email protected]
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

[langsec-discuss] Fwd: [BAHA] tracking the source of data

Reply via email to