Re: [whatwg] Subresource Integrity-based caching

2017-03-03 Thread Anne van Kesteren
On Fri, Mar 3, 2017 at 11:01 PM, Alex Jordan  wrote:
> On Fri, Mar 03, 2017 at 09:21:20AM +0100, Anne van Kesteren wrote:
>> I think https://github.com/w3c/webappsec-subresource-integrity/issues/22
>> is the canonical issue, but no concrete ideas thus far.
>
> Great, thanks! I've got some thoughts on potential solutions; where
> would be the best place to put those - here or on GitHub? I'm assuming
> the latter but figured I'd ask :)

You are assuming correctly.


-- 
https://annevankesteren.nl/


Re: [whatwg] Subresource Integrity-based caching

2017-03-03 Thread Roger Hågensen
I'd like to apologize to Alex Jordan for mistaking him for James Roper, 
and vice versa mistaking James Roper for Alex Jordan.


In the previous email when I said "your" as in "your suggestion" I meant 
to refer that to Alex, while the hash stuff was meant for James.


I got confused by a email from James with a fully quoted copy of the 
email from before where I quoted Alex but with no text or comments from 
James, and I assumed for a moment it was the same person with different 
emails (work vs private, or a alt which is not unusual).


I hope this confusion won't derail the topic fully.



--
Roger Hågensen,
Freelancer, Norway.


Re: [whatwg] Subresource Integrity-based caching

2017-03-03 Thread Roger Hågensen

On 2017-03-03 01:02, James Roper wrote:


How about you miss-understanding the fact that a hash can only
ever guarantee that two resources are different. A hash can not
guarantee that two resources are the same. A hash do infer a high
probability they are the same but can never guarantee it, such is
the nature of of a hash. A carefully tailored jquery.js that
matches the hash of the "original jquery.js" could be crafted and
contain a hidden payload. Now the browser suddenly injects this
script into all websites that the user visits that use that
particular version of jquery.js which I'd call a extremely serious
security hole. you can't rely on length either as that could also
be padded to match the length. Not to mention that this is also
crossing the CORS threshold (the first instance is from a
different domain than the current page is for example). Accidental
(natural) collision probabilities for sha256/sha384/sha512 is very
low, but intentional ones are higher than accidental ones.


This is completely wrong. No one has *ever* produced an intentional 
collision in sha256 or greater.


Huh? When did I ever state that? I have never said that sha256 or higher 
having been broken, do not put words/lies in my mouth please. I find 
that highly offensive.
I said "could", just ask any cryptographer. It is highly improbable, but 
theoretically possible, but fully impractical to attempt (current stages 
of quantum computing has not shown any magic bullet yet).


I'm equally concerned with a natural collision, while the probability is 
incredibly small the chance is 50/50 (if we imagine all files containing 
random data and file lengths, which they don't).


And as to my statement "a hash can only ever guarantee that two 
resources are different. A hash can not guarantee that two resources are 
the same" again that is true. You can even test this by using small 
enough hashes (CRC-4 or something simple) and editing a file and you'll 
see that what I say is true.


You know how a these types of hashes works right? They are NOT UNIQUE, 
if if you want something unique then those are called "Perfect Hash" 
which is not something you want to use for cryptography.
If a hash like sha256 was unique it would be a compression miracle as 
you could then just "uncompress" the hash.


Only if the data you hash is the same size as the hash can you perfectly 
re-create the data that is hashed. Which is what I proposed with my UUID 
suggestion.
Do note that I'm talking about Version 1 UUIDs and not the random 
Version 4 ones which re not unique.


In case you missed the headlines, last week Google announced it 
created a sha1 collision. That is the first, and only known sha1 
collision ever created. This means sha1 is broken, and must not be used.


Now it's unlikely (as in, it's not likely to happen in the history of 
a billion universes), but it is possible that at some point in the 
history of sha256 that a collision was accidentally created. This 
probability is non zero, which is greater than the impossibility of 
intentionally creating a collision, hence it is more likely that we 
will get an accidental collision than an intentional collision.


Sha1 still has it's uses. Now I haven't checked but sha1 just as md5 are 
still ok to use with HMAC. Also it's odd that you say sha1 should not be 
used at all. Nothing wrong with using it as a file hash/checksum. With 
the number of files and the increase in-data CRC32 is nit that useful 
(unless you divide the file in chunks and provide a CRC32 array instead).


A hash is not the right way to do what you want, a UUID and a (or 
multiple) trusted shared cache(s) is.
The issue with using a hash is that at some point sha256 could become 
deprecated, do the browser start ignoring i then? Should it behave as if 
the javascript file had no hash or that it's potentially dangerous now?


Also take note that a UUID can also be made into a valid URI, but I 
suggested adding a attribute as that would make older browsers/version 
"forward compatible" as the URI till works normally.



And to try and not entirely run you idea into the ground. It's not 
detailed enough. By that I mean you would need a way for the webdesigner 
to inform the browser that they do not want the scripts hosted on their 
site replaced by these from another site. Now requiring a Opt Out is a 
pain in the ass, and when security is concerned one such never have to 
"Opt Out to get more secure", one should by default be more secure.
Which means that you would need to add another attribute or modify the 
integrity one to allow cache sharing.


Now myself I would never do that, even if the hash matches I'd never 
feel comfortable running a script originating from some other site in 
the page I'm delivering to my visitor.
I would not actually want the browser to even cache my script and 
provide that to other sites pages.


I might however feel comfortable adding 

Re: [whatwg] Subresource Integrity-based caching

2017-03-03 Thread Anne van Kesteren
On Thu, Mar 2, 2017 at 6:07 PM, Domenic Denicola  wrote:
> I don't know what the latest is on attempting to get around this, although 
> that document suggests some ideas.

I think https://github.com/w3c/webappsec-subresource-integrity/issues/22
is the canonical issue, but no concrete ideas thus far.


-- 
https://annevankesteren.nl/


Re: [whatwg] Subresource Integrity-based caching

2017-03-02 Thread James Roper
On 3 Mar. 2017 00:09, "Roger Hågensen"  wrote:

On 2017-03-02 02:59, Alex Jordan wrote:

> Here's the basic problem: say I want to include jQuery in a page. I
> have two options: host it myself, or use a CDN.
>
Not to be overly  pedantic but you might re-evaluate the need for jquery
and other such frameworks. "HTML5" now do pretty much the same as these
older frameworks wit the same or less amount of code.



The fundamental issue is that there isn't a direct correspondence to
> what a resource's _address_ is and what the resource _itself_ is. In
> other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google
> CDN are the exact same resource in terms of content, but are
> considered different because they have different addresses.
>
Yes and no. The URI is a unique identifier for a resource. If the URI is
different then it is not the same resource. The content may be the same but
the resource is different. You are mixing up resource and content in your
explanation. Address and resource is in this case the same thing.


2. This could potentially be a carrot used to encourage adoption of
> Subresource Integrity, because it confers a significant performance
> benefit.
>
This can be solved by improved webdesign. Serve a static page (and not
forget gzip compression), and then background load the script and extra CSS
etc. By the time the visitor has read/looked/scanned down the page the
scripts are loaded. There is however some bandwidth savings merit in your
suggestion.

...That's okay, though, because the fact that it's based on a hash
> guarantees that the cache
>
> matches what would've been sent over the network - if these were
> different, the hash wouldn't match and the mechanism wouldn't kick in.
>
> ...
>
> Anyway, this email is long enough already but I'd love to hear
> thoughts about things I've missed, etc.
>
How about you miss-understanding the fact that a hash can only ever
guarantee that two resources are different. A hash can not guarantee that
two resources are the same. A hash do infer a high probability they are the
same but can never guarantee it, such is the nature of of a hash. A
carefully tailored jquery.js that matches the hash of the "original
jquery.js" could be crafted and contain a hidden payload. Now the browser
suddenly injects this script into all websites that the user visits that
use that particular version of jquery.js which I'd call a extremely serious
security hole. you can't rely on length either as that could also be padded
to match the length. Not to mention that this is also crossing the CORS
threshold (the first instance is from a different domain than the current
page is for example). Accidental (natural) collision probabilities for
sha256/sha384/sha512 is very low, but intentional ones are higher than
accidental ones.


This is completely wrong. No one has *ever* produced an intentional
collision in sha256 or greater. That's the whole point of cryptographic
hashes, it is impossible to intentionally create a collision, if it were
possible to create a collision, the algorithm would need to be declared
broken and never used again. In case you missed the headlines, last week
Google announced it created a sha1 collision. That is the first, and only
known sha1 collision ever created. This means sha1 is broken, and must not
be used.

Now it's unlikely (as in, it's not likely to happen in the history of a
billion universes), but it is possible that at some point in the history of
sha256 that a collision was accidentally created. This probability is non
zero, which is greater than the impossibility of intentionally creating a
collision, hence it is more likely that we will get an accidental collision
than an intentional collision.


While I haven't checked the browser source codes I would not be surprised
if browsers in certain situations cache a single instance of a script that
is used on multiple pages on a website (different url but the same hash).
This would be within the same domain and usually not a security issue.


It might be better to use UUIDs instead and a trusted "cache", this cache
could be provided by a 3rd party or the Browser developer themselves.

Such a solution would require a uuid="{some-uuid-number}" attribute added
to the script tag.  And if encountered the browser could ignore the script
url and integrity attribute and use either a local cache (from earlier) or
a trusted cache on the net somewhere.

The type of scripts that would benefit from this are the ones that follow a
Major.Minor.Patch version format, and a UUID would apply to the major
version only, so if the major version changed then the script would require
a new UUID.

Only the most popular scripts and major versions of such would be cached,
but those are usually the larger and more important ones anyway. It's your
jquery, bootstrap, angular, modernizer, and so on.

-- 
Roger Hågensen,
Freelancer, Norway.


Re: [whatwg] Subresource Integrity-based caching

2017-03-02 Thread Domenic Denicola
Hi Alex! Glad to have you here.

This is indeed a popular idea. The biggest problem with it is privacy concerns. 
The best summary I've seen is at 
https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html.
 In particular if such a suggestion were implemented, any web page would be 
able to easily determine the browsing history of any user, similar to the old 
visited link color trick.

I don't know what the latest is on attempting to get around this, although that 
document suggests some ideas.



Re: [whatwg] Subresource Integrity-based caching

2017-03-02 Thread Roger Hågensen

On 2017-03-02 02:59, Alex Jordan wrote:

Here's the basic problem: say I want to include jQuery in a page. I
have two options: host it myself, or use a CDN.
Not to be overly  pedantic but you might re-evaluate the need for jquery 
and other such frameworks. "HTML5" now do pretty much the same as these 
older frameworks wit the same or less amount of code.




The fundamental issue is that there isn't a direct correspondence to
what a resource's _address_ is and what the resource _itself_ is. In
other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google
CDN are the exact same resource in terms of content, but are
considered different because they have different addresses.
Yes and no. The URI is a unique identifier for a resource. If the URI is 
different then it is not the same resource. The content may be the same 
but the resource is different. You are mixing up resource and content in 
your explanation. Address and resource is in this case the same thing.



2. This could potentially be a carrot used to encourage adoption of
Subresource Integrity, because it confers a significant performance
benefit.
This can be solved by improved webdesign. Serve a static page (and not 
forget gzip compression), and then background load the script and extra 
CSS etc. By the time the visitor has read/looked/scanned down the page 
the scripts are loaded. There is however some bandwidth savings merit in 
your suggestion.



...That's okay, though, because the fact that it's based on a hash guarantees 
that the cache
matches what would've been sent over the network - if these were
different, the hash wouldn't match and the mechanism wouldn't kick in.

...
Anyway, this email is long enough already but I'd love to hear
thoughts about things I've missed, etc.
How about you miss-understanding the fact that a hash can only ever 
guarantee that two resources are different. A hash can not guarantee 
that two resources are the same. A hash do infer a high probability they 
are the same but can never guarantee it, such is the nature of of a 
hash. A carefully tailored jquery.js that matches the hash of the 
"original jquery.js" could be crafted and contain a hidden payload. Now 
the browser suddenly injects this script into all websites that the user 
visits that use that particular version of jquery.js which I'd call a 
extremely serious security hole. you can't rely on length either as that 
could also be padded to match the length. Not to mention that this is 
also crossing the CORS threshold (the first instance is from a different 
domain than the current page is for example). Accidental (natural) 
collision probabilities for sha256/sha384/sha512 is very low, but 
intentional ones are higher than accidental ones.


While I haven't checked the browser source codes I would not be 
surprised if browsers in certain situations cache a single instance of a 
script that is used on multiple pages on a website (different url but 
the same hash). This would be within the same domain and usually not a 
security issue.



It might be better to use UUIDs instead and a trusted "cache", this 
cache could be provided by a 3rd party or the Browser developer themselves.


Such a solution would require a uuid="{some-uuid-number}" attribute 
added to the script tag.  And if encountered the browser could ignore 
the script url and integrity attribute and use either a local cache 
(from earlier) or a trusted cache on the net somewhere.


The type of scripts that would benefit from this are the ones that 
follow a Major.Minor.Patch version format, and a UUID would apply to the 
major version only, so if the major version changed then the script 
would require a new UUID.


Only the most popular scripts and major versions of such would be 
cached, but those are usually the larger and more important ones anyway. 
It's your jquery, bootstrap, angular, modernizer, and so on.


--
Roger Hågensen,
Freelancer, Norway.



[whatwg] Subresource Integrity-based caching

2017-03-01 Thread Alex Jordan
Heya!

So recently I've been thinking about caching on the web and think I've
come up with a pretty neat trick to improve things. However before I
go file a bunch of bugs against browsers I thought it prudent to get
feedback from spec folks.

Here's the basic problem: say I want to include jQuery in a page. I
have two options: host it myself, or use a CDN. If I host it myself,
then I don't get caching benefits for first-time visitors because they
(obviously) haven't visited my page and requested jQuery from my
domain before. Using a sufficiently widespread CDN will fix this for
me, because the more widespread the CDN is, the more likely the user
is to have encountered a page using that CDN. However, this is
somewhat problematic because it leaks data to the CDN operator.

The fundamental issue is that there isn't a direct correspondence to
what a resource's _address_ is and what the resource _itself_ is. In
other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google
CDN are the exact same resource in terms of content, but are
considered different because they have different addresses.

Here's the proposal: when browsers encounter a