Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt
On Tue, 07 Sep 2010 02:46:29 +0200, Gregory Maxwell gmaxw...@gmail.com  
wrote:


On Mon, Sep 6, 2010 at 3:19 PM, Aryeh Gregor simetrical+...@gmail.com  
wrote:
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com  
wrote:
The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer)
checks for. For additional safety, one could also check for the  
trailing
version indicator, which ought to be a NULL byte for current Ogg. [1]  
[2]


OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.


Um... If you do that you will fail to capture on files that most other
ogg reading tools will happily capture on.  Common software will read
forward until it hits OggS then it will check the page CRC (in total,
9 bytes of capture).  For example, here is a file which begins with a
kilobyte of \0: http://myrandomnode.dyndns.org:8080/~gmaxwell/test.ogg
 Everything I had handy played it.

This could fail to capture on a live stream that didn't ensure new
listeners began at a page boundary. I don't know if any of these
exist.

I don't know if breaking these cases would matter much but herein lies
the danger of sniffing— everyone thinks they're an expert but no one
really has a handle on the implications.



Your test file is too short, perhaps it was truncated? I made my own one  
by adding 1024 NULL bytes to the beginning of  
http://v2v.cc/~j/theora_testsuite/320x240.ogg


That file doesn't play in Totem, because it (GStreamer) relies on  
sniffing. It also won't play in Opera for this reason, but I haven't seen  
any bug reports about failure to play similar files since Opera introduced  
support for Ogg. It does play in Firefox, but not in Chrome. Just like  
with WebM, I think browsers should not support files that begin with  
arbitrary amounts of garbage, as it requires reading the whole file before  
failing.


The file doesn't play in VLC or MPlayer, but does play in xine.

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 03:56:54 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/6/10 3:19 PM, Aryeh Gregor wrote:
On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedtphil...@opera.com   
wrote:
The Ogg page begins with the 4 bytes OggS, which is what Opera  
(GStreamer)
checks for. For additional safety, one could also check for the  
trailing
version indicator, which ought to be a NULL byte for current Ogg. [1]  
[2]


OggS\0 as the first five bytes seems safe to check for.  It's rather
short, I guess because it's repeated on every page, but five bytes is
long enough that it should occur by random only negligibly often, in
either text or binary files.


So if a text file starts with U+4F67 U+6753 (both CJK ideographs) and  
any ASCII character (can this happen in the real world?) you're OK with  
treating it as Ogg?  Same for files staring with U+674F U+5367 (both CJK  
ideographs) and any plane-0 character whose Unicode codepoint is 0 mod  
2^16 (plenty of CJK stuff like that)?  Is your CJK good enough that you  
know text files would never start like this, or are you just assuming  
that people who are silly enough to use UTF-16 for their text files and  
aren't in Europe don't matter?  Or that you don't care about people who  
happen to not use a BOM?


Thanks for pointing out these cases. I hadn't thought about it, but my CJK  
is good enough to say something about them:


'佧杓A' encoded in UTF-16BE is 'OggS\x00A'. However, 佧杓 is nonsensical  
in at least Chinese, neither character is among the 3000 most common  
characters [1]. Search results on Google (4) and Baidu (3) are nonsense  
too. I don't know if things are any different for Japanese, but given the  
Google results I doubt it.


'杏卧' encoded in UTF-16LE is 'OggS', and both of these characters are in  
the top 3000, but together they're nonsense: apricot crouch. (That's the  
same crouch as in Crouching Tiger, Hidden Dragon, but the order is wrong  
so it doesn't mean Crouching Apricot). In the Google and Baidu results,  
the only occurrence of the string seems to be in 一衫红杏卧江亭, which  
appears to be a theme of an apricot tree by a pavillion that appears in  
several paintings [2] [3] [4].


All in all, I wouldn't be more worried about this than the risk of random  
binary data matching. Also, UTF-16 isn't a very common encoding for  
simplified Chinese (卧 is a simplified character), GBK is dominant.


We could also add checking of the 6th byte, which should normally be 0x02  
for first page of logical bitstream (bos).



It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first
four bytes


U+1A45 is Thai, looks like.  DFA3 is a surrogate, so you're ok there.

U+451A is CJK.  U+A3DF looks like a Yi syllable, so you're more or less  
ok there too.  I'm assuming you've already checked this byte sequence  
out in UTF-8 and some other common encodings?


It's garbage in at least UTF-8, Big5 and GBK.

I'm not sure what infrastructure is in place, but perhaps one could *not*  
sniff if Content-Type also indicates an encoding? That way there's a  
solution for those who really want to display the hypothetical false  
positives as text.


[1] http://www.zein.se/patrick/3000char.html
[2]  
http://hi.baidu.com/%BC%C5%D5%AB/blog/item/f0ee8a4c5a5d0c02b3de05aa.html

[3] http://blog.sina.com.cn/s/blog_475be8240100ew5q.html
[4] http://www.zgddhj.cn/zj/bh/zhouhongyi/201007/32053.html

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] The choice of script global object to use when the script element is moved

2010-09-07 Thread Henri Sivonen
NOTE! This email contains URLs to pages that crash WebKit on reload, so you 
probably shouldn't follow the URLs here in any WebKit-based browser where you 
have something important going on in the same renderer process. (In Chrome, 
only the isolated content process crashes.)

 On Fri, Sep 3, 2010 at 3:49 AM, Henri Sivonen hsivo...@iki.fi wrote:
 When evaluating a parser-inserted script, there are three potential script 
 global objects to use:
  1) The script global object of the document whose active parser the parser 
 that inserted the script is.
  2) The script global object of the document that owned the script element 
 at the time of invoking the run algorithm.
  3) The script global object of the document that owns the script element at 
 the time of script evaluation.
 
 The spec says the answer is #3. WebKit (with HTML5 parser or without) says 
 the answer is #1. Firefox 3.6 says the answer is #2.

On Sep 3, 2010, at 20:47, Adam Barth wrote:

 I'm not sure it makes much of a difference from a security point of
 view.  I suspect WebKit does #3 because it grabs the security context
 immediately before executing the script.  

With my demos, WebKit seems to be doing #1:
http://hsivonen.iki.fi/test/moz/move-during-parse-parent.html
http://hsivonen.iki.fi/test/moz/move-during-parse-parent2.html

The second one doesn't finish loading in Gecko (both with new and old parser), 
because Gecko tries to unblock the parser on the wrong document and never 
unblock the parser that needs to be unblocked.

 That actually seems
 marginally safer because it means you're unlikely to grab an out-dated
 security context.

Since the check If scripting is disabled for the script element, or if the 
user agent does not support the scripting language given by the script block's 
type for this script element, then the user agent must abort these steps at 
this point. The script is not executed. happens at the time of the run 
algorithm and since iframe sandboxing or Content Security Policies can cause 
scripting to be disabled, a security check has to happen at the time of 
invoking the run algorithm (assuming we don't want to change the pre-existing 
behavior of what happens in the common same-document case where a script gets 
rejected and we don't want to decouple the time on supported language check 
from the time of security-based rejections; this would be detectable in the 
document.write() case). 

For external scripts, this means that if we want to evaluate against a script 
global object associated with the owner doc of the script node at evaluating 
time, the security checks may have been performed in the context of another 
document and script global object. If we want security checks against the 
script global object associated with the owner doc at evaluation time, I think 
it's necessary to do the security checks twice: one during the run algorithm 
(in which case failing the checks doesn't fire any error events) and another 
time right before evaluation (in which case I suppose a failure should act the 
same way as a network failure and fire the error event). That's more complex 
than what's in Gecko now. (Not insurmountably complex, but more complex anyway.)

I'm worried about doing the security checks at run algorithm time and 
evaluating with a different script global object without redoing the security 
check. However, it may be that I only worry because I feel I don't know enough 
of all the possibilities to be confident that such a separation of time of 
check and time of use would be safe here.

Is there any good reason (other than differing from current IE9 PP behavior) 
not to do #1 with the additional stipulation that making the document whose 
active parser the parser is go away makes the scripts that are pending to run 
in the context of its script global object behave (stop?) the same regardless 
of which document they are in? (I.e. if the document that had the active parser 
gets torn down before the scripts inserted into another doc have loaded, those 
scripts wouldn't be evaluated.) I still believe doing #1 in Gecko would be the 
simplest thing. With the test cases above, WebKit seems to be doing #1 already 
(and then crashing) and Opera fails to move the scripts so the execution 
context ends up being the same as it would in case #1.

On Sep 3, 2010, at 20:55, Jonas Sicking wrote:

 On Fri, Sep 3, 2010 at 10:47 AM, Adam Barth w...@adambarth.com wrote:
 I'm not sure it makes much of a difference from a security point of
 view.
 
 Agreed. Pages can only move elements between pages that are in the
 same security context anyway so I can't really think of any attacks
 that any of the approaches would enable or disable.

Suppose there are two docs from one Origin. The document that the parser is 
associated with doesn't have a CSP. A script in it moves a node in such a way 
that the parser ends up inserting subsequent scripts into another document. 
That document has a CSP that bans scripts. 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread And Clover

On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more* 
sniffing, and even enshrining it in a web standard.


Sniffing is a perpetual disaster that, after several security-sensitive 
problems, web browsers have been moving to deprecate/mitigate. If 
browsers want to guess types when no Content-Type is specified(*) then 
fine, but there is no good reason to ignore an explicitly-set type. I 
don't want my `application/octet-stream` file download service to be 
repurposeable as a video player for some other party!


For reasons already argued about here, you will never make the results 
of content-sniffing reliable, so why bother to standardise it? A 
standardised unreliable feature is no better than an unstandardised one.


The typing mechanism of the web (and more) is Content-Type, period. 
There should be no confusion of this with officially-endorsed sniffing. 
That it is 'hard' for web authors to ensure the correct Content-Types 
are set is:


* not W3/WHATWG's problem. If web servers make adding Content-Type 
information hard, then web servers need to be updated to make it easier;


* not really true, at least for Apache which can allow AddType et al in 
the .htaccess files that low-end shared hosts use. This may not be 
widely-known or practised, but that doesn't really merit changing the 
standards for everyone else to cope with.


(*: or, the traditional reason for sniffing, `text/plain`, due to Apache 
inappropriately sending this type for unknown files by default, bug 
13986. That doesn't seem to apply here.)


--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Julian Reschke

On 07.09.2010 11:51, And Clover wrote:

On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more*
sniffing, and even enshrining it in a web standard.


+1


Sniffing is a perpetual disaster that, after several security-sensitive
problems, web browsers have been moving to deprecate/mitigate. If
browsers want to guess types when no Content-Type is specified(*) then
fine, but there is no good reason to ignore an explicitly-set type. I
don't want my `application/octet-stream` file download service to be
repurposeable as a video player for some other party!


Hmm, that's what Content-Disposition: attachment is for...


...


Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:


On 09/07/2010 03:56 AM, Boris Zbarsky wrote:


P.S. Sniffing is harder that you seem to think. It really is...


Quite. It surprises and saddens me that anyone wants to argue for *more*  
sniffing, and even enshrining it in a web standard.


IE9, Safari and Chrome ignore Content-Type in a video context and rely  
on sniffing. If you want Content-Type to be respected, convince the  
developers of those 3 browsers to change. If not, it's quite inevitable  
that Opera and Firefox will eventually have to follow.


Sniffing is a perpetual disaster that, after several security-sensitive  
problems, web browsers have been moving to deprecate/mitigate.


For reasons already argued about here, you will never make the results  
of content-sniffing reliable, so why bother to standardise it? A  
standardised unreliable feature is no better than an unstandardised one.


Unless all browsers agree to respect Content-Type, the next best thing is  
to agree on the same sniffing. Why would leaving it undefined be better?



The typing mechanism of the web (and more) is Content-Type, period.


Only in theory. In practice, Content-Type is an unreliable indicator of  
the type of a resource. Sniffing is already part of the web architecture,  
with all its problems.


(*: or, the traditional reason for sniffing, `text/plain`, due to Apache  
inappropriately sending this type for unknown files by default, bug  
13986. That doesn't seem to apply here.)


It hasn't been explicitly stated, but I assume that the only cases where  
sniffing for video formats would be employed would be for missing  
Content-Type, text/plain and application/octet-stream.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Julian Reschke

On 07.09.2010 12:52, Philip Jägenstedt wrote:

...
IE9, Safari and Chrome ignore Content-Type in a video context and rely
on sniffing. If you want Content-Type to be respected, convince the
developers of those 3 browsers to change. If not, it's quite inevitable
that Opera and Firefox will eventually have to follow.
...


We have heard that Safari sniffs for compatibility with content 
previously consumed by Quicktime, and that IE9 may sniff because they 
(currently) can't pass the content-type to the decoding machinery (or 
something like that).


So you really would have to standardize sniffing in the browsers, but 
also in the components they delegate video display to. Good luck with that.


Best regards, Julian


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no.  Also not what at least 
some of the browsers implement.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 6:01 AM, Julian Reschke wrote:

Hmm, that's what Content-Disposition: attachment is for...


This header is currently ignored in non-toplevel browsing contexts in 
web browsers, last I checked.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 4:11 AM, Philip Jägenstedt wrote:

It's garbage in at least UTF-8, Big5 and GBK.


Thanks.  I assume that applies to the OggS\0 sequence too, right?  I 
appreciate the data!



I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 
(thanks, Apache!), that should be reasonable, I think.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no.  Also not what at least  
some of the browsers implement.


Oops, I was talking about top-level contexts here. In a video context,  
always ignoring the Content-Type and always sniffing is the most sane  
solution (apart from always respecting Content-Type).


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 9:03 AM, Philip Jägenstedt wrote:

On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 6:52 AM, Philip Jägenstedt wrote:

It hasn't been explicitly stated, but I assume that the only cases where
sniffing for video formats would be employed would be for missing
Content-Type, text/plain and application/octet-stream.


That's not what at least Aryeh is proposing, no. Also not what at
least some of the browsers implement.


Oops, I was talking about top-level contexts here. In a video context,
always ignoring the Content-Type and always sniffing is the most sane
solution (apart from always respecting Content-Type).


Yes, the suggestion Aryeh is making is that toplevel contexts should use 
the same sniffing algorithm as the video context and should sniff 
everything for video, completely ignoring the Content-Type header.


-Boris



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Philip Jägenstedt

On Tue, 07 Sep 2010 14:56:38 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 9/7/10 4:11 AM, Philip Jägenstedt wrote:

It's garbage in at least UTF-8, Big5 and GBK.


Thanks.  I assume that applies to the OggS\0 sequence too, right?  I  
appreciate the data!


UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do  
real-world text documents include \0 bytes? (I don't know.)



I'm not sure what infrastructure is in place, but perhaps one could
*not* sniff if Content-Type also indicates an encoding?


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1  
(thanks, Apache!), that should be reasonable, I think.


Are you saying that Apache has, at various times, set the default  
character encoding to UTF-8 or ISO-8859-1? I was hoping that no encoding  
parameter at all would be sent :/


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 9:16 AM, Philip Jägenstedt wrote:

UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do
real-world text documents include \0 bytes?


Yes.  Real-world text documents include all sorts of gunk.  Just rarely.


As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1
(thanks, Apache!), that should be reasonable, I think.


Are you saying that Apache has, at various times, set the default
character encoding to UTF-8 or ISO-8859-1?


Yes, precisely.  Though the UTF-8 stuff was Linux distros, I think, not 
Apache itself (in that Apache just sent the thing passed to 
AddDefaultCharset and they changed the value of that from ISO-8859-1 to 
UTF-8 in their distro packages).  Here's the relevant comment from the 
Gecko source where we do our text-or-binary sniffing for toplevel contexts:


 Make sure to do a case-sensitive exact match comparison here.  Apache
 1.x just sends text/plain for unknown, while Apache 2.x sends
 text/plain with a ISO-8859-1 charset.  Debian's Apache version, just to
 be different, sends text/plain with iso-8859-1 charset.  For extra fun,
 FC7, RHEL4, and Ubuntu Feisty send charset=UTF-8.  Don't do general
 case-insensitive comparison, since we really want to apply this crap as
 rarely as we can.


I was hoping that no encoding parameter at all would be sent :/


Heh.  I've long since given up all hope of reason on this stuff; I just 
try to keep it as sane and predictable and simple as possible.  :(


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Maciej Stachowiak

On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote:

 On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:
 
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.
 
 IE9, Safari and Chrome ignore Content-Type in a video context and rely on 
 sniffing. If you want Content-Type to be respected, convince the developers 
 of those 3 browsers to change. If not, it's quite inevitable that Opera and 
 Firefox will eventually have to follow.

At least in the case of Safari, we initially added sniffing for the benefit of 
video types likely to be played with the QuickTime plugin - mainly .mov and 
various flavors of MPEG. It is common for these to be served with an incorrect 
MIME type. And we did not want to impose a high transition cost on content 
already being served via the QuickTime plugin. The QuickTime plugin may be a 
slightly less relevant consideration now than when we first thought about this, 
but at this point it is possible content has been migrated to video while 
still carrying broken MIME types.

Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It 
might be possible to treat those types more strictly - i.e. only play Ogg or 
WebM when labeled as such, and not ever sniff content with those MIME types as 
anything else.

In Safari's case this would have limited impact since a non-default codec 
plugin would need to be installed to play either Ogg or WebM. I'm also not sure 
it's sensible to have varying levels of strictness for different types. But 
it's an option, if we want to go there.

Regards,
Maciej



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread David Singer

On Sep 7, 2010, at 2:51 , And Clover wrote:

 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.

Yes.  We should be striving for a world in which as little sniffing as possible 
happens (and is needed).  Basically, we have the problem because of 
mis-configured or (from the author's point of view) unconfigurable web servers. 
 

So I wonder if
* the presence of a source element with a type attribute should be believed 
(at least for the purposes of dispatch and 'canplay' decisions)? If the author 
of the page got it wrong or lied, surely they can accept (and deal with) the 
consequences?
* whether we should only really sniff the two types in HTTP headers that tend 
to get used as fallbacks (application/octet-stream and text/plain)?  Though I 
note that I have sometimes *wanted* a file displayed as text (and not 
interpreted) and been defeated by sniffing (though not as often as watching 
binary dumped on my screen as if it were text).



David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread David Singer
And like I said before, please be careful of assuming our intent and desires 
from the way things currently work.  We are thinking, listening, and 
implementing (and fixing bugs, and re-inspecting older behavior in lower-level 
code), so there is some...flexibility...I think.

On Sep 7, 2010, at 9:12 , Maciej Stachowiak wrote:

 
 On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote:
 
 On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote:
 
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 
 P.S. Sniffing is harder that you seem to think. It really is...
 
 Quite. It surprises and saddens me that anyone wants to argue for *more* 
 sniffing, and even enshrining it in a web standard.
 
 IE9, Safari and Chrome ignore Content-Type in a video context and rely on 
 sniffing. If you want Content-Type to be respected, convince the developers 
 of those 3 browsers to change. If not, it's quite inevitable that Opera and 
 Firefox will eventually have to follow.
 
 At least in the case of Safari, we initially added sniffing for the benefit 
 of video types likely to be played with the QuickTime plugin - mainly .mov 
 and various flavors of MPEG. It is common for these to be served with an 
 incorrect MIME type. And we did not want to impose a high transition cost on 
 content already being served via the QuickTime plugin. The QuickTime plugin 
 may be a slightly less relevant consideration now than when we first thought 
 about this, but at this point it is possible content has been migrated to 
 video while still carrying broken MIME types.
 
 Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It 
 might be possible to treat those types more strictly - i.e. only play Ogg or 
 WebM when labeled as such, and not ever sniff content with those MIME types 
 as anything else.
 
 In Safari's case this would have limited impact since a non-default codec 
 plugin would need to be installed to play either Ogg or WebM. I'm also not 
 sure it's sensible to have varying levels of strictness for different types. 
 But it's an option, if we want to go there.
 
 Regards,
 Maciej
 

David Singer
Multimedia and Software Standards, Apple Inc.



Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 3:01 AM, Julian Reschke julian.resc...@gmx.de wrote:
 On 07.09.2010 11:51, And Clover wrote:
 On 09/07/2010 03:56 AM, Boris Zbarsky wrote:
 P.S. Sniffing is harder that you seem to think. It really is...

 Quite. It surprises and saddens me that anyone wants to argue for *more*
 sniffing, and even enshrining it in a web standard.

 +1

-1

It sadden me when standards bodies ignore reality and leave
implementors to invent their own non-iteroperable algorithms for
security-critical behavior.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 3:19 PM, Adam Barth wrote:

It sadden me when standards bodies ignore reality and leave
implementors to invent their own non-iteroperable algorithms for
security-critical behavior.


Of course nothing prevents us from saying UAs MUST NOT sniff but if they 
do anyway they MUST use a given algorithm, right?


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Aryeh Gregor
On Tue, Sep 7, 2010 at 5:51 AM, And Clover and...@doxdesk.com wrote:
 Quite. It surprises and saddens me that anyone wants to argue for *more*
 sniffing, and even enshrining it in a web standard.

I'm not a fan of sniffing, but I'm also not a fan of blindly believing
clearly wrong MIME types and thereby forcing authors to do needless
configuration work, which they might not even be able to do.  I'm not
yet sure what the correct tradeoff is here, but I'm pretty sure it's
not no sniffing at all under any conditions.

 Sniffing is a perpetual disaster that, after several security-sensitive
 problems, web browsers have been moving to deprecate/mitigate. If browsers
 want to guess types when no Content-Type is specified(*) then fine, but
 there is no good reason to ignore an explicitly-set type. I don't want my
 `application/octet-stream` file download service to be repurposeable as a
 video player for some other party!

If you don't want that, you should be using access control, not MIME types.

 For reasons already argued about here, you will never make the results of
 content-sniffing reliable, so why bother to standardise it? A standardised
 unreliable feature is no better than an unstandardised one.

Sure it is, because it's unreliable in the same way across all
browsers.  That means that in any given case, all browsers will work
the same.  This is particularly essential for security -- undocumented
sniffing behavior has caused more than one vulnerability in the past.

 The typing mechanism of the web (and more) is Content-Type, period. There
 should be no confusion of this with officially-endorsed sniffing.

We already have officially endorsed sniffing where web compat requires it:

http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#content-type-sniffing
http://tools.ietf.org/html/draft-abarth-mime-sniff-05

The question is if we can avoid it for new content types like
video/audio.  If not, we should spec it in advance so we at least have
something that's as sane as possible under the circumstances.

 That it is
 'hard' for web authors to ensure the correct Content-Types are set is:

 * not W3/WHATWG's problem. If web servers make adding Content-Type
 information hard, then web servers need to be updated to make it easier;

I don't know about the W3C, but reality is the WHATWG's problem.  We
can't let things be broken and just say it's someone else's fault.  We
need to institute workarounds at our level for failures on other
levels if that's what's necessary to get good security and a good
user/author experience.

 * not really true, at least for Apache which can allow AddType et al in the
 .htaccess files that low-end shared hosts use. This may not be widely-known
 or practised, but that doesn't really merit changing the standards for
 everyone else to cope with.

Creating a .htaccess file is a technical procedure that most users
will not know how to do, particularly since the problem will probably
just manifest itself as the video doesn't work.  It's also not
possible on some hosts -- although it's certainly possible on the
large majority of cheap shared hosts, and of course on hosts where the
author has root access.

On Tue, Sep 7, 2010 at 6:52 AM, Philip Jägenstedt phil...@opera.com wrote:
 It hasn't been explicitly stated, but I assume that the only cases where
 sniffing for video formats would be employed would be for missing
 Content-Type, text/plain and application/octet-stream.

If those are the only common MIME types incorrectly served for unknown
file types, that seems reasonable.  (Some files might be actively
misidentified, like if I have an Ogg file saved as .jpeg, but
hopefully this will be very rare.)

On Tue, Sep 7, 2010 at 8:56 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/7/10 4:11 AM, Philip Jägenstedt wrote:
 It's garbage in at least UTF-8, Big5 and GBK.

 Thanks.  I assume that applies to the OggS\0 sequence too, right?  I
 appreciate the data!

 I'm not sure what infrastructure is in place, but perhaps one could
 *not* sniff if Content-Type also indicates an encoding?

 As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1
 (thanks, Apache!), that should be reasonable, I think.

So at least for Ogg and WebM, how about:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.
* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.

Within these constraints, false positives in the sniffing 

Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder 
might consume arbitrary amounts of data before saying that there was an 
error.


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 3:29 PM, Aryeh Gregor wrote:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so open video in new tab doesn't mysteriously fail on some setups.


I could probably live with those, actually.


* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.


This might be pretty difficult to implement, since the video decoder 
might consume arbitrary amounts of data before saying that there was an 
error.


-Boris


Re: [whatwg] The choice of script global object to use when the script element is moved

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 1:40 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On Sep 3, 2010, at 20:55, Jonas Sicking wrote:
 On Fri, Sep 3, 2010 at 10:47 AM, Adam Barth w...@adambarth.com wrote:
 I'm not sure it makes much of a difference from a security point of
 view.

 Agreed. Pages can only move elements between pages that are in the
 same security context anyway so I can't really think of any attacks
 that any of the approaches would enable or disable.

 Suppose there are two docs from one Origin. The document that the parser is 
 associated with doesn't have a CSP. A script in it moves a node in such a way 
 that the parser ends up inserting subsequent scripts into another document. 
 That document has a CSP that bans scripts. Would you consider it a bug if a 
 script ran in the context of the script global object of the document whose 
 CSP says no scripts?

It sounds like CSP is creating sub-origin privileges.  Sub-origin
privileges don't really work, so it's unclear to what a sensible
result would be.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 12:21 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/7/10 3:19 PM, Adam Barth wrote:
 It sadden me when standards bodies ignore reality and leave
 implementors to invent their own non-iteroperable algorithms for
 security-critical behavior.

 Of course nothing prevents us from saying UAs MUST NOT sniff but if they do
 anyway they MUST use a given algorithm, right?

That's a contrary to duty imperative, which is something that's been
puzzling philosophers for centuries.  A more sensible requirement
would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
they do, they MUST use a the following algorithm.

Adam


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

Of course nothing prevents us from saying UAs MUST NOT sniff but if they do
anyway they MUST use a given algorithm, right?


That's a contrary to duty imperative, which is something that's been
puzzling philosophers for centuries.  A more sensible requirement
would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
they do, they MUST use a the following algorithm.


Except that in practice SHOULD NOT is treated as carte blanche to do the 
undesirable thing.  It has no teeth.  MUST NOT doesn't much either, but 
it's _something_ at least (in the sense that one can clearly claim that 
violating a MUST NOT is a bug).


-Boris


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Adam Barth
On Tue, Sep 7, 2010 at 2:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Of course nothing prevents us from saying UAs MUST NOT sniff but if they
 do
 anyway they MUST use a given algorithm, right?

 That's a contrary to duty imperative, which is something that's been
 puzzling philosophers for centuries.  A more sensible requirement
 would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if
 they do, they MUST use a the following algorithm.

 Except that in practice SHOULD NOT is treated as carte blanche to do the
 undesirable thing.  It has no teeth.  MUST NOT doesn't much either, but it's
 _something_ at least (in the sense that one can clearly claim that violating
 a MUST NOT is a bug).

In any case, lawyering the requirement level in the spec isn't the way
to solve these problems.  You need to change the underlying incentives
to actually affect what gets implemented.

Adam


[whatwg] ArrayBuffer and ByteArray questions

2010-09-07 Thread Jian Li
Hi,

Several specs, like File API and WebGL, use ArrayBuffer, while other spec,
like XMLHttpRequest Level 2, use ByteArray. Should we change to use the same
name all across our specs? Since we define ArrayBuffer in the Typed Arrays
spec (
https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/TypedArray-spec.html),
should we favor ArrayBuffer?

In addition, can we consider adding ArrayBuffer support to BlobBuilder,
FormData, and XMLHttpRequest.send()?

Thanks,

Jian


Re: [whatwg] HTML6 Doctype

2010-09-07 Thread fantasai

On 08/29/2010 08:00 AM, Tab Atkins Jr. wrote:

On Sat, Aug 28, 2010 at 8:15 PM, David John Burrowes
bain...@davidjohnburrowes.com  wrote:

I agree that they don't have access to versioning info from within the 
languages.

But, CSS has some sense of versions (CSS, CSS2, and CSS3).  This gives me some
ability to say ah, SurfBrowser 1.0 and 2.0 supported CSS1, but with 3.0 they
supported some of CSS2 etc etc.


To be honest, no you can't.  Not with such large labels, at least.
You'll never be able to say X browser supports CSS3, but CSS3 isn't
a thing.  You can name individual modules only, which is equivalent to
naming large features of HTML.


How do you define a large feature of HTML?

~fantasai


Re: [whatwg] HTML6 Doctype

2010-09-07 Thread Tab Atkins Jr.
On Tue, Sep 7, 2010 at 4:45 PM, fantasai fantasai.li...@inkedblade.net wrote:
 On 08/29/2010 08:00 AM, Tab Atkins Jr. wrote:

 On Sat, Aug 28, 2010 at 8:15 PM, David John Burrowes
 bain...@davidjohnburrowes.com  wrote:

 I agree that they don't have access to versioning info from within the
 languages.

 But, CSS has some sense of versions (CSS, CSS2, and CSS3).  This gives me
 some
 ability to say ah, SurfBrowser 1.0 and 2.0 supported CSS1, but with 3.0
 they
 supported some of CSS2 etc etc.

 To be honest, no you can't.  Not with such large labels, at least.
 You'll never be able to say X browser supports CSS3, but CSS3 isn't
 a thing.  You can name individual modules only, which is equivalent to
 naming large features of HTML.

 How do you define a large feature of HTML?

Roughly, has a subheading in the TOC.  Depending on the exact
organization, this might actually be a heading or subsubheading.

~TJ


Re: [whatwg] Video with MIME type application/octet-stream

2010-09-07 Thread Boris Zbarsky

On 9/7/10 5:35 PM, Adam Barth wrote:

In any case, lawyering the requirement level in the spec isn't the way
to solve these problems.  You need to change the underlying incentives
to actually affect what gets implemented.


The incentive structure for pretty much any sort of sniffing is a 
prisoner's dilemma.  Life's hard.


-Boris



[whatwg] Descendents of source and track elements should be skipped when serializing HTML fragment (10.3)

2010-09-07 Thread Ryosuke Niwa
Hi,

In HTML fragment serialization algorithm, we skip elements with empty
content model in step 2.2:
If current node is an
areahttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-map-element.html#the-area-element
, 
basehttp://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-base-element
, 
basefonthttp://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#basefont
, 
bgsoundhttp://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#bgsound
, 
brhttp://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-br-element
, 
colhttp://www.whatwg.org/specs/web-apps/current-work/multipage/tabular-data.html#the-col-element
, 
embedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-embed-element
, 
framehttp://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#frame
, 
hrhttp://www.whatwg.org/specs/web-apps/current-work/multipage/grouping-content.html#the-hr-element
, 
imghttp://www.whatwg.org/specs/web-apps/current-work/multipage/embedded-content-1.html#the-img-element
, 
inputhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html#the-input-element
, 
keygenhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-keygen-element
, 
linkhttp://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-link-element
, 
metahttp://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#meta
, 
paramhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-param-element,
or 
wbrhttp://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-wbr-elementelement,
then continue on to the next child node at this point.

For consistency, I propose to skip children of source and track elements as
well.

Also, the algorithm does not seem to specify the behavior on deprecated (or
undocumented) elements such as isindex.  Can we assume that the
serialization of such elements are UA-defined?

Best,
Ryosuke Niwa
Software Engineer
rn...@webkit.org


Re: [whatwg] Descendents of source and track elements should be skipped when serializing HTML fragment (10.3)

2010-09-07 Thread Adam Barth
The HTML parser expands the isindex element into a bunch of other
elements, so it never inserts that element into the tree.  Of course,
an isindex element could have been inserted via the DOM...

Adam


On Tue, Sep 7, 2010 at 4:44 PM, Ryosuke Niwa ryosuke.n...@gmail.com wrote:
 Hi,
 In HTML fragment serialization algorithm, we skip elements with empty
 content model in step 2.2:
 If current node is
 an area, base, basefont, bgsound, br, col, embed, frame, hr, img, input, keygen, link, meta, param,
 or wbrelement, then continue on to the next child node at this point.

 For consistency, I propose to skip children of source and track elements as
 well.
 Also, the algorithm does not seem to specify the behavior on deprecated (or
 undocumented) elements such as isindex.  Can we assume that the
 serialization of such elements are UA-defined?
 Best,
 Ryosuke Niwa
 Software Engineer
 rn...@webkit.org



Re: [whatwg] Timed tracks: feedback compendium

2010-09-07 Thread Chris Double
On Wed, Sep 8, 2010 at 11:19 AM, Ian Hickson i...@hixie.ch wrote:
On Thu, 26 Aug 2010, Chris Double wrote:

 Firefox (in the case of video) uses file extensions to identify video
 files. We have an internal maping of file extensions to mime types. We
 don't sniff the content. I imagine we'd do the same with whatever file
 extension is used for WebSRT.

(I assume this is only for the filesystem, not data from the wire!)

Yes, this is only for the filesystem.

Chris.
-- 
http://www.bluishcoder.co.nz


[whatwg] Canvas API: What should happen if non-finite floats are used

2010-09-07 Thread Boris Zbarsky

Consider this testcase:

!doctype html
html
  body
canvas id=c width=200 height=200/canvas
script
try {
  var c = document.getElementById(c),
  t = c.getContext(2d);
  t.moveTo(100, 100);
  t.lineTo(NaN, NaN);
  t.lineTo(50, 25);
  t.stroke();
} catch (e) {alert(e); }
/script
  /body
/html

Behavior in the spec seems to be undefined (in particular, no mention is 
made as to what the canvas API functions are supposed to do if 
non-finite values are passed in).  Behavior in browsers is:


Presto: Throws NOT_SUPPORTED_ERR on that lineTo(NaN, NaN) call.
Gecko: Throws DOM_SYNTAX_ERR on that lineTo(NaN, NaN) call.
Webkit: Silently ignores the lineTo(NaN, NaN) call, and then
draws a line from (100,100) to (50, 25).

Seems like the spec needs to define this.

-Boris

P.S.  This isn't a hypothetical issue; this came up in a page that was 
trying to graph things using canvas and ending up with divide-by-0 all 
over the place.  It worked in webkit (though not drawing the right 
thing, so much).  It failed to draw anything in Presto or Gecko.


Re: [whatwg] Canvas API: What should happen if non-finite floats are used

2010-09-07 Thread Sam Weinig
In 4.8.11.1 the spec does state:

Except where otherwise specified, for the 2D context interface, any method 
call with a numeric argument whose value is infinite or a NaN value must be 
ignored.

-Sam

On Sep 7, 2010, at 9:41 PM, Boris Zbarsky wrote:

 Consider this testcase:
 
 !doctype html
 html
  body
canvas id=c width=200 height=200/canvas
script
try {
  var c = document.getElementById(c),
  t = c.getContext(2d);
  t.moveTo(100, 100);
  t.lineTo(NaN, NaN);
  t.lineTo(50, 25);
  t.stroke();
} catch (e) {alert(e); }
/script
  /body
 /html
 
 Behavior in the spec seems to be undefined (in particular, no mention is made 
 as to what the canvas API functions are supposed to do if non-finite values 
 are passed in).  Behavior in browsers is:
 
 Presto: Throws NOT_SUPPORTED_ERR on that lineTo(NaN, NaN) call.
 Gecko: Throws DOM_SYNTAX_ERR on that lineTo(NaN, NaN) call.
 Webkit: Silently ignores the lineTo(NaN, NaN) call, and then
draws a line from (100,100) to (50, 25).
 
 Seems like the spec needs to define this.
 
 -Boris
 
 P.S.  This isn't a hypothetical issue; this came up in a page that was trying 
 to graph things using canvas and ending up with divide-by-0 all over the 
 place.  It worked in webkit (though not drawing the right thing, so much).  
 It failed to draw anything in Presto or Gecko.



Re: [whatwg] Canvas API: What should happen if non-finite floats are used

2010-09-07 Thread Jonas Sicking
This seems like a strange choice of behavior. Given that this is very
likely a bug in the program, wouldn't it make more sense to throw an
exception as to make it easier to debug? Similar to for example
Node.appendChild when called with a null argument.

/ Jonas

On Tue, Sep 7, 2010 at 10:32 PM, Sam Weinig wei...@apple.com wrote:
 In 4.8.11.1 the spec does state:

 Except where otherwise specified, for the 2D context interface, any method 
 call with a numeric argument whose value is infinite or a NaN value must be 
 ignored.

 -Sam

 On Sep 7, 2010, at 9:41 PM, Boris Zbarsky wrote:

 Consider this testcase:

 !doctype html
 html
  body
    canvas id=c width=200 height=200/canvas
    script
    try {
      var c = document.getElementById(c),
      t = c.getContext(2d);
      t.moveTo(100, 100);
      t.lineTo(NaN, NaN);
      t.lineTo(50, 25);
      t.stroke();
    } catch (e) {alert(e); }
    /script
  /body
 /html

 Behavior in the spec seems to be undefined (in particular, no mention is 
 made as to what the canvas API functions are supposed to do if non-finite 
 values are passed in).  Behavior in browsers is:

 Presto: Throws NOT_SUPPORTED_ERR on that lineTo(NaN, NaN) call.
 Gecko: Throws DOM_SYNTAX_ERR on that lineTo(NaN, NaN) call.
 Webkit: Silently ignores the lineTo(NaN, NaN) call, and then
        draws a line from (100,100) to (50, 25).

 Seems like the spec needs to define this.

 -Boris

 P.S.  This isn't a hypothetical issue; this came up in a page that was 
 trying to graph things using canvas and ending up with divide-by-0 all over 
 the place.  It worked in webkit (though not drawing the right thing, so 
 much).  It failed to draw anything in Presto or Gecko.