I do not know if these resulted in CVEs, but a number of password managers had 
security vulnerabilities that were due to misparsing URIs with regular 
expressions. URIs comprise a regular language, so these don’t strictly meet 
what you are asking for; but the regular expressions were not built around the 
specification of the URL language.

(Also note that most developers use perl-compatible regular expressions which 
can describe some non-regular languages.)

The exploits were typically of the form of tricking a password manager to fill 
secrets for legitimate site A into malicious site B. Password managers store a 
URL along with username and password. And those password managers that assist 
with filling web login forms will not fill in forms on pages for which the 
location does not “match” what the the password manager has stored. That 
matching requires that the password manager parse the web page location and 
also parse what it has stored internally. Misparsing of either can lead to 
failure that would allow the kind of attack I described.

The sort of thing that was appearing in phishing email that could fool several 
password managers was of the form

 data:text/html,https://accounts.google.com/ServiceLogin#fragment

The fragment would create the malicious page, but I don’t have that at hand. 
(What I’ve quoted is in our tests, but I’d have to dig through a lot of history 
to get the complete example of a malicious URI.) The point is that the password 
manager and the browser may interpret that URI very very differently.

In our case (I work for the makers of 1Password), we had known not to use 
regexes for this ever since Sergey explained the basic principles of langsec to 
me many years ago at a party in a Las Vegas penthouse. However, knowing not to 
do X and not doing X are two different things. One reason that it took as time 
to move to proper parsing of URLs is because many users had lots of data with 
alleged URLs like “www.facebook.com”. It turns out that that is a syntactically 
valid URL, but when parsed according to the specification, it is a URL with 
only a path component.

So because for years we had accepted and allowed users to accumulate such 
malformed data, such as “www.facebook.com”, we couldn’t simply switch over to 
proper URL parsing. So since there wasn’t an easy fix and the threat was seen 
as “theoretical” by many developers, we were slower than we should have been to 
address this. However, I believe that because I’d learned about langsec, we had 
a head start when the motivation finally arrived.

When actual exploits started to be reported for these sorts of problems in our 
competitors, the internal incentives changed rapidly. I also had the very 
dubious pleasure of saying “I told you so.”[1] By this time we had already 
moved to proper URL parsing on some platforms and had already identified the 
challenges and other things that would need to be addressed.

Anyway, our parsing of URLs still has to make an exception for things like 
“www.facebook.com”. (And we also strip leading and trailing white space.). And 
there are a few other quirks. Here are a couple of excerpts from our internal 
documentation on URI parsing:

> Apple’s `NSURL URLWithString` incorrectly rejects URI strings that that have 
> unescaped "/" and "?" within the query portion of a URI. These should be 
> allowed according to RFC 3986.

and

> There are two distinct classes available in the Android SDK. There is the 
> good `java.net.URI` and the bad `android.net.Uri`. Do _not_ use the one from 
> the android package as this implementation performs "little to no validation”.

Cheers,

-j

–- 
Jeffrey Goldberg
Chief Defender Against the Dark Arts @ AgileBits
https://1password.com

[1] A note on “I told you so"

One thing I learned during the long process of “telling so” is that many 
extremely smart developers either never took a Formal Language Theory course or 
forget the contents of the course within weeks after the final exam. So what 
Sergey had been able to explain to me in a few minutes (my background is in 
Linguistics) is something that I have struggled to explain to my colleagues. I 
was asked to construct a malicious candidate URL that the regex parsing would 
mishandle. I tried to explain that while I may not be able to do so, that 
doesn’t mean that others aren’t, but that if we use parsers that are built from 
the language specification then we can preclude a whole category of attacks, 
whether I can construct an instance of that category or not. People nodded 
their heads and worked on things that were more immediate priorities.

I’m please to say that while not everyone is a convert to my religion of 
langsec, the notion of precluding whole categories of yet to be discovered 
attacks through certain design principles has been gaining ground. Even if not 
all of our input validators are based on the form specs of expected input we 
have single purpose, strict(-ish), isolated validators for everything coming 
into our servers.

One thing that I’ve learned during this process is that I can’t simply tell 
developers “don’t do it that way, and here is some math that should guide you 
on how to do it”. I have to give them usable tools for doing it right. It would 
be really nice if there were some simple examples of using nail. (No the DNS 
example is not simple for people who don’t already know what is going on.)



> On Nov 27, 2017, at 5:01 PM, Frithjof Schulze <f...@ciphron.de> wrote:
> 
> 
> Hi all,
> 
> is anybody aware of some recent CVEs that are the direct result of the 
> attempt to parse a non-regular grammar with regular expressions? I expected 
> to find something like this on cve.mitre.org/find, but didn’t. I expected at 
> least a case where regex were used to do „input sanitization“ but found 
> nothing good.
> 
> Why am I looking for such a CVE? When talking about LangSec-ideas with 
> (mostly web) developers I regularly have the problem that I either have to 
> explain a lot of theory (that few people are really interested in) or have to 
> go „thou shall not ….“ to argue against „but this is easy and works in 
> practice!“.
> 
> The best solution for me so far is similar to the approach suggested in the 
> "Seven Turrents of Babel“: Show people examples of the bugs they are up 
> against if they use certain antipatterns. I am now compiling a list of 
> educational and „realistic“ bugs in the sense, that the most more popular 
> bugs like string terminators in X.509/ASN.1, Heartbleed and the Android 
> Master Key are great examples for LangSec in general, but are not the kind of 
> bugs many developers have to actually deal with.
> 
> Most people I am talking to actually know that they „shouldn’t“ use regex to 
> do certain things, because of the Lovecraftian post on Stack Overflow[1], but 
> that post also just repeatedly mentions the impossibility of a suggested 
> solution without giving any examples of negative consequences of trying.
> 
> [1|  https://stackoverflow.com/a/1732454
> 
> Cheers,
> Frithjof
> _______________________________________________
> langsec-discuss mailing list
> langsec-discuss@mail.langsec.org
> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

Reply via email to