Re: [whatwg] Video with MIME type application/octet-stream
On Tue, 07 Sep 2010 02:46:29 +0200, Gregory Maxwell gmaxw...@gmail.com wrote: On Mon, Sep 6, 2010 at 3:19 PM, Aryeh Gregor simetrical+...@gmail.com wrote: On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt phil...@opera.com wrote: The Ogg page begins with the 4 bytes OggS, which is what Opera (GStreamer) checks for. For additional safety, one could also check for the trailing version indicator, which ought to be a NULL byte for current Ogg. [1] [2] OggS\0 as the first five bytes seems safe to check for. It's rather short, I guess because it's repeated on every page, but five bytes is long enough that it should occur by random only negligibly often, in either text or binary files. Um... If you do that you will fail to capture on files that most other ogg reading tools will happily capture on. Common software will read forward until it hits OggS then it will check the page CRC (in total, 9 bytes of capture). For example, here is a file which begins with a kilobyte of \0: http://myrandomnode.dyndns.org:8080/~gmaxwell/test.ogg Everything I had handy played it. This could fail to capture on a live stream that didn't ensure new listeners began at a page boundary. I don't know if any of these exist. I don't know if breaking these cases would matter much but herein lies the danger of sniffing— everyone thinks they're an expert but no one really has a handle on the implications. Your test file is too short, perhaps it was truncated? I made my own one by adding 1024 NULL bytes to the beginning of http://v2v.cc/~j/theora_testsuite/320x240.ogg That file doesn't play in Totem, because it (GStreamer) relies on sniffing. It also won't play in Opera for this reason, but I haven't seen any bug reports about failure to play similar files since Opera introduced support for Ogg. It does play in Firefox, but not in Chrome. Just like with WebM, I think browsers should not support files that begin with arbitrary amounts of garbage, as it requires reading the whole file before failing. The file doesn't play in VLC or MPlayer, but does play in xine. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, 07 Sep 2010 03:56:54 +0200, Boris Zbarsky bzbar...@mit.edu wrote: On 9/6/10 3:19 PM, Aryeh Gregor wrote: On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedtphil...@opera.com wrote: The Ogg page begins with the 4 bytes OggS, which is what Opera (GStreamer) checks for. For additional safety, one could also check for the trailing version indicator, which ought to be a NULL byte for current Ogg. [1] [2] OggS\0 as the first five bytes seems safe to check for. It's rather short, I guess because it's repeated on every page, but five bytes is long enough that it should occur by random only negligibly often, in either text or binary files. So if a text file starts with U+4F67 U+6753 (both CJK ideographs) and any ASCII character (can this happen in the real world?) you're OK with treating it as Ogg? Same for files staring with U+674F U+5367 (both CJK ideographs) and any plane-0 character whose Unicode codepoint is 0 mod 2^16 (plenty of CJK stuff like that)? Is your CJK good enough that you know text files would never start like this, or are you just assuming that people who are silly enough to use UTF-16 for their text files and aren't in Europe don't matter? Or that you don't care about people who happen to not use a BOM? Thanks for pointing out these cases. I hadn't thought about it, but my CJK is good enough to say something about them: '佧杓A' encoded in UTF-16BE is 'OggS\x00A'. However, 佧杓 is nonsensical in at least Chinese, neither character is among the 3000 most common characters [1]. Search results on Google (4) and Baidu (3) are nonsense too. I don't know if things are any different for Japanese, but given the Google results I doubt it. '杏卧' encoded in UTF-16LE is 'OggS', and both of these characters are in the top 3000, but together they're nonsense: apricot crouch. (That's the same crouch as in Crouching Tiger, Hidden Dragon, but the order is wrong so it doesn't mean Crouching Apricot). In the Google and Baidu results, the only occurrence of the string seems to be in 一衫红杏卧江亭, which appears to be a theme of an apricot tree by a pavillion that appears in several paintings [2] [3] [4]. All in all, I wouldn't be more worried about this than the risk of random binary data matching. Also, UTF-16 isn't a very common encoding for simplified Chinese (卧 is a simplified character), GBK is dominant. We could also add checking of the 6th byte, which should normally be 0x02 for first page of logical bitstream (bos). It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first four bytes U+1A45 is Thai, looks like. DFA3 is a surrogate, so you're ok there. U+451A is CJK. U+A3DF looks like a Yi syllable, so you're more or less ok there too. I'm assuming you've already checked this byte sequence out in UTF-8 and some other common encodings? It's garbage in at least UTF-8, Big5 and GBK. I'm not sure what infrastructure is in place, but perhaps one could *not* sniff if Content-Type also indicates an encoding? That way there's a solution for those who really want to display the hypothetical false positives as text. [1] http://www.zein.se/patrick/3000char.html [2] http://hi.baidu.com/%BC%C5%D5%AB/blog/item/f0ee8a4c5a5d0c02b3de05aa.html [3] http://blog.sina.com.cn/s/blog_475be8240100ew5q.html [4] http://www.zgddhj.cn/zj/bh/zhouhongyi/201007/32053.html -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] The choice of script global object to use when the script element is moved
NOTE! This email contains URLs to pages that crash WebKit on reload, so you probably shouldn't follow the URLs here in any WebKit-based browser where you have something important going on in the same renderer process. (In Chrome, only the isolated content process crashes.) On Fri, Sep 3, 2010 at 3:49 AM, Henri Sivonen hsivo...@iki.fi wrote: When evaluating a parser-inserted script, there are three potential script global objects to use: 1) The script global object of the document whose active parser the parser that inserted the script is. 2) The script global object of the document that owned the script element at the time of invoking the run algorithm. 3) The script global object of the document that owns the script element at the time of script evaluation. The spec says the answer is #3. WebKit (with HTML5 parser or without) says the answer is #1. Firefox 3.6 says the answer is #2. On Sep 3, 2010, at 20:47, Adam Barth wrote: I'm not sure it makes much of a difference from a security point of view. I suspect WebKit does #3 because it grabs the security context immediately before executing the script. With my demos, WebKit seems to be doing #1: http://hsivonen.iki.fi/test/moz/move-during-parse-parent.html http://hsivonen.iki.fi/test/moz/move-during-parse-parent2.html The second one doesn't finish loading in Gecko (both with new and old parser), because Gecko tries to unblock the parser on the wrong document and never unblock the parser that needs to be unblocked. That actually seems marginally safer because it means you're unlikely to grab an out-dated security context. Since the check If scripting is disabled for the script element, or if the user agent does not support the scripting language given by the script block's type for this script element, then the user agent must abort these steps at this point. The script is not executed. happens at the time of the run algorithm and since iframe sandboxing or Content Security Policies can cause scripting to be disabled, a security check has to happen at the time of invoking the run algorithm (assuming we don't want to change the pre-existing behavior of what happens in the common same-document case where a script gets rejected and we don't want to decouple the time on supported language check from the time of security-based rejections; this would be detectable in the document.write() case). For external scripts, this means that if we want to evaluate against a script global object associated with the owner doc of the script node at evaluating time, the security checks may have been performed in the context of another document and script global object. If we want security checks against the script global object associated with the owner doc at evaluation time, I think it's necessary to do the security checks twice: one during the run algorithm (in which case failing the checks doesn't fire any error events) and another time right before evaluation (in which case I suppose a failure should act the same way as a network failure and fire the error event). That's more complex than what's in Gecko now. (Not insurmountably complex, but more complex anyway.) I'm worried about doing the security checks at run algorithm time and evaluating with a different script global object without redoing the security check. However, it may be that I only worry because I feel I don't know enough of all the possibilities to be confident that such a separation of time of check and time of use would be safe here. Is there any good reason (other than differing from current IE9 PP behavior) not to do #1 with the additional stipulation that making the document whose active parser the parser is go away makes the scripts that are pending to run in the context of its script global object behave (stop?) the same regardless of which document they are in? (I.e. if the document that had the active parser gets torn down before the scripts inserted into another doc have loaded, those scripts wouldn't be evaluated.) I still believe doing #1 in Gecko would be the simplest thing. With the test cases above, WebKit seems to be doing #1 already (and then crashing) and Opera fails to move the scripts so the execution context ends up being the same as it would in case #1. On Sep 3, 2010, at 20:55, Jonas Sicking wrote: On Fri, Sep 3, 2010 at 10:47 AM, Adam Barth w...@adambarth.com wrote: I'm not sure it makes much of a difference from a security point of view. Agreed. Pages can only move elements between pages that are in the same security context anyway so I can't really think of any attacks that any of the approaches would enable or disable. Suppose there are two docs from one Origin. The document that the parser is associated with doesn't have a CSP. A script in it moves a node in such a way that the parser ends up inserting subsequent scripts into another document. That document has a CSP that bans scripts.
Re: [whatwg] Video with MIME type application/octet-stream
On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. Sniffing is a perpetual disaster that, after several security-sensitive problems, web browsers have been moving to deprecate/mitigate. If browsers want to guess types when no Content-Type is specified(*) then fine, but there is no good reason to ignore an explicitly-set type. I don't want my `application/octet-stream` file download service to be repurposeable as a video player for some other party! For reasons already argued about here, you will never make the results of content-sniffing reliable, so why bother to standardise it? A standardised unreliable feature is no better than an unstandardised one. The typing mechanism of the web (and more) is Content-Type, period. There should be no confusion of this with officially-endorsed sniffing. That it is 'hard' for web authors to ensure the correct Content-Types are set is: * not W3/WHATWG's problem. If web servers make adding Content-Type information hard, then web servers need to be updated to make it easier; * not really true, at least for Apache which can allow AddType et al in the .htaccess files that low-end shared hosts use. This may not be widely-known or practised, but that doesn't really merit changing the standards for everyone else to cope with. (*: or, the traditional reason for sniffing, `text/plain`, due to Apache inappropriately sending this type for unknown files by default, bug 13986. That doesn't seem to apply here.) -- And Clover mailto:a...@doxdesk.com http://www.doxdesk.com/
Re: [whatwg] Video with MIME type application/octet-stream
On 07.09.2010 11:51, And Clover wrote: On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. +1 Sniffing is a perpetual disaster that, after several security-sensitive problems, web browsers have been moving to deprecate/mitigate. If browsers want to guess types when no Content-Type is specified(*) then fine, but there is no good reason to ignore an explicitly-set type. I don't want my `application/octet-stream` file download service to be repurposeable as a video player for some other party! Hmm, that's what Content-Disposition: attachment is for... ... Best regards, Julian
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote: On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. IE9, Safari and Chrome ignore Content-Type in a video context and rely on sniffing. If you want Content-Type to be respected, convince the developers of those 3 browsers to change. If not, it's quite inevitable that Opera and Firefox will eventually have to follow. Sniffing is a perpetual disaster that, after several security-sensitive problems, web browsers have been moving to deprecate/mitigate. For reasons already argued about here, you will never make the results of content-sniffing reliable, so why bother to standardise it? A standardised unreliable feature is no better than an unstandardised one. Unless all browsers agree to respect Content-Type, the next best thing is to agree on the same sniffing. Why would leaving it undefined be better? The typing mechanism of the web (and more) is Content-Type, period. Only in theory. In practice, Content-Type is an unreliable indicator of the type of a resource. Sniffing is already part of the web architecture, with all its problems. (*: or, the traditional reason for sniffing, `text/plain`, due to Apache inappropriately sending this type for unknown files by default, bug 13986. That doesn't seem to apply here.) It hasn't been explicitly stated, but I assume that the only cases where sniffing for video formats would be employed would be for missing Content-Type, text/plain and application/octet-stream. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video with MIME type application/octet-stream
On 07.09.2010 12:52, Philip Jägenstedt wrote: ... IE9, Safari and Chrome ignore Content-Type in a video context and rely on sniffing. If you want Content-Type to be respected, convince the developers of those 3 browsers to change. If not, it's quite inevitable that Opera and Firefox will eventually have to follow. ... We have heard that Safari sniffs for compatibility with content previously consumed by Quicktime, and that IE9 may sniff because they (currently) can't pass the content-type to the decoding machinery (or something like that). So you really would have to standardize sniffing in the browsers, but also in the components they delegate video display to. Good luck with that. Best regards, Julian
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 6:52 AM, Philip Jägenstedt wrote: It hasn't been explicitly stated, but I assume that the only cases where sniffing for video formats would be employed would be for missing Content-Type, text/plain and application/octet-stream. That's not what at least Aryeh is proposing, no. Also not what at least some of the browsers implement. -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 6:01 AM, Julian Reschke wrote: Hmm, that's what Content-Disposition: attachment is for... This header is currently ignored in non-toplevel browsing contexts in web browsers, last I checked. -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 4:11 AM, Philip Jägenstedt wrote: It's garbage in at least UTF-8, Big5 and GBK. Thanks. I assume that applies to the OggS\0 sequence too, right? I appreciate the data! I'm not sure what infrastructure is in place, but perhaps one could *not* sniff if Content-Type also indicates an encoding? As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 (thanks, Apache!), that should be reasonable, I think. -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote: On 9/7/10 6:52 AM, Philip Jägenstedt wrote: It hasn't been explicitly stated, but I assume that the only cases where sniffing for video formats would be employed would be for missing Content-Type, text/plain and application/octet-stream. That's not what at least Aryeh is proposing, no. Also not what at least some of the browsers implement. Oops, I was talking about top-level contexts here. In a video context, always ignoring the Content-Type and always sniffing is the most sane solution (apart from always respecting Content-Type). -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 9:03 AM, Philip Jägenstedt wrote: On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky bzbar...@mit.edu wrote: On 9/7/10 6:52 AM, Philip Jägenstedt wrote: It hasn't been explicitly stated, but I assume that the only cases where sniffing for video formats would be employed would be for missing Content-Type, text/plain and application/octet-stream. That's not what at least Aryeh is proposing, no. Also not what at least some of the browsers implement. Oops, I was talking about top-level contexts here. In a video context, always ignoring the Content-Type and always sniffing is the most sane solution (apart from always respecting Content-Type). Yes, the suggestion Aryeh is making is that toplevel contexts should use the same sniffing algorithm as the video context and should sniff everything for video, completely ignoring the Content-Type header. -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, 07 Sep 2010 14:56:38 +0200, Boris Zbarsky bzbar...@mit.edu wrote: On 9/7/10 4:11 AM, Philip Jägenstedt wrote: It's garbage in at least UTF-8, Big5 and GBK. Thanks. I assume that applies to the OggS\0 sequence too, right? I appreciate the data! UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do real-world text documents include \0 bytes? (I don't know.) I'm not sure what infrastructure is in place, but perhaps one could *not* sniff if Content-Type also indicates an encoding? As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 (thanks, Apache!), that should be reasonable, I think. Are you saying that Apache has, at various times, set the default character encoding to UTF-8 or ISO-8859-1? I was hoping that no encoding parameter at all would be sent :/ -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 9:16 AM, Philip Jägenstedt wrote: UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do real-world text documents include \0 bytes? Yes. Real-world text documents include all sorts of gunk. Just rarely. As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 (thanks, Apache!), that should be reasonable, I think. Are you saying that Apache has, at various times, set the default character encoding to UTF-8 or ISO-8859-1? Yes, precisely. Though the UTF-8 stuff was Linux distros, I think, not Apache itself (in that Apache just sent the thing passed to AddDefaultCharset and they changed the value of that from ISO-8859-1 to UTF-8 in their distro packages). Here's the relevant comment from the Gecko source where we do our text-or-binary sniffing for toplevel contexts: Make sure to do a case-sensitive exact match comparison here. Apache 1.x just sends text/plain for unknown, while Apache 2.x sends text/plain with a ISO-8859-1 charset. Debian's Apache version, just to be different, sends text/plain with iso-8859-1 charset. For extra fun, FC7, RHEL4, and Ubuntu Feisty send charset=UTF-8. Don't do general case-insensitive comparison, since we really want to apply this crap as rarely as we can. I was hoping that no encoding parameter at all would be sent :/ Heh. I've long since given up all hope of reason on this stuff; I just try to keep it as sane and predictable and simple as possible. :( -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote: On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote: On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. IE9, Safari and Chrome ignore Content-Type in a video context and rely on sniffing. If you want Content-Type to be respected, convince the developers of those 3 browsers to change. If not, it's quite inevitable that Opera and Firefox will eventually have to follow. At least in the case of Safari, we initially added sniffing for the benefit of video types likely to be played with the QuickTime plugin - mainly .mov and various flavors of MPEG. It is common for these to be served with an incorrect MIME type. And we did not want to impose a high transition cost on content already being served via the QuickTime plugin. The QuickTime plugin may be a slightly less relevant consideration now than when we first thought about this, but at this point it is possible content has been migrated to video while still carrying broken MIME types. Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It might be possible to treat those types more strictly - i.e. only play Ogg or WebM when labeled as such, and not ever sniff content with those MIME types as anything else. In Safari's case this would have limited impact since a non-default codec plugin would need to be installed to play either Ogg or WebM. I'm also not sure it's sensible to have varying levels of strictness for different types. But it's an option, if we want to go there. Regards, Maciej
Re: [whatwg] Video with MIME type application/octet-stream
On Sep 7, 2010, at 2:51 , And Clover wrote: On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. Yes. We should be striving for a world in which as little sniffing as possible happens (and is needed). Basically, we have the problem because of mis-configured or (from the author's point of view) unconfigurable web servers. So I wonder if * the presence of a source element with a type attribute should be believed (at least for the purposes of dispatch and 'canplay' decisions)? If the author of the page got it wrong or lied, surely they can accept (and deal with) the consequences? * whether we should only really sniff the two types in HTTP headers that tend to get used as fallbacks (application/octet-stream and text/plain)? Though I note that I have sometimes *wanted* a file displayed as text (and not interpreted) and been defeated by sniffing (though not as often as watching binary dumped on my screen as if it were text). David Singer Multimedia and Software Standards, Apple Inc.
Re: [whatwg] Video with MIME type application/octet-stream
And like I said before, please be careful of assuming our intent and desires from the way things currently work. We are thinking, listening, and implementing (and fixing bugs, and re-inspecting older behavior in lower-level code), so there is some...flexibility...I think. On Sep 7, 2010, at 9:12 , Maciej Stachowiak wrote: On Sep 7, 2010, at 3:52 AM, Philip Jägenstedt wrote: On Tue, 07 Sep 2010 11:51:55 +0200, And Clover and...@doxdesk.com wrote: On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. IE9, Safari and Chrome ignore Content-Type in a video context and rely on sniffing. If you want Content-Type to be respected, convince the developers of those 3 browsers to change. If not, it's quite inevitable that Opera and Firefox will eventually have to follow. At least in the case of Safari, we initially added sniffing for the benefit of video types likely to be played with the QuickTime plugin - mainly .mov and various flavors of MPEG. It is common for these to be served with an incorrect MIME type. And we did not want to impose a high transition cost on content already being served via the QuickTime plugin. The QuickTime plugin may be a slightly less relevant consideration now than when we first thought about this, but at this point it is possible content has been migrated to video while still carrying broken MIME types. Ogg and WebM are probably not yet poisoned by a mass of unlabeled data. It might be possible to treat those types more strictly - i.e. only play Ogg or WebM when labeled as such, and not ever sniff content with those MIME types as anything else. In Safari's case this would have limited impact since a non-default codec plugin would need to be installed to play either Ogg or WebM. I'm also not sure it's sensible to have varying levels of strictness for different types. But it's an option, if we want to go there. Regards, Maciej David Singer Multimedia and Software Standards, Apple Inc.
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, Sep 7, 2010 at 3:01 AM, Julian Reschke julian.resc...@gmx.de wrote: On 07.09.2010 11:51, And Clover wrote: On 09/07/2010 03:56 AM, Boris Zbarsky wrote: P.S. Sniffing is harder that you seem to think. It really is... Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. +1 -1 It sadden me when standards bodies ignore reality and leave implementors to invent their own non-iteroperable algorithms for security-critical behavior. Adam
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 3:19 PM, Adam Barth wrote: It sadden me when standards bodies ignore reality and leave implementors to invent their own non-iteroperable algorithms for security-critical behavior. Of course nothing prevents us from saying UAs MUST NOT sniff but if they do anyway they MUST use a given algorithm, right? -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, Sep 7, 2010 at 5:51 AM, And Clover and...@doxdesk.com wrote: Quite. It surprises and saddens me that anyone wants to argue for *more* sniffing, and even enshrining it in a web standard. I'm not a fan of sniffing, but I'm also not a fan of blindly believing clearly wrong MIME types and thereby forcing authors to do needless configuration work, which they might not even be able to do. I'm not yet sure what the correct tradeoff is here, but I'm pretty sure it's not no sniffing at all under any conditions. Sniffing is a perpetual disaster that, after several security-sensitive problems, web browsers have been moving to deprecate/mitigate. If browsers want to guess types when no Content-Type is specified(*) then fine, but there is no good reason to ignore an explicitly-set type. I don't want my `application/octet-stream` file download service to be repurposeable as a video player for some other party! If you don't want that, you should be using access control, not MIME types. For reasons already argued about here, you will never make the results of content-sniffing reliable, so why bother to standardise it? A standardised unreliable feature is no better than an unstandardised one. Sure it is, because it's unreliable in the same way across all browsers. That means that in any given case, all browsers will work the same. This is particularly essential for security -- undocumented sniffing behavior has caused more than one vulnerability in the past. The typing mechanism of the web (and more) is Content-Type, period. There should be no confusion of this with officially-endorsed sniffing. We already have officially endorsed sniffing where web compat requires it: http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#content-type-sniffing http://tools.ietf.org/html/draft-abarth-mime-sniff-05 The question is if we can avoid it for new content types like video/audio. If not, we should spec it in advance so we at least have something that's as sane as possible under the circumstances. That it is 'hard' for web authors to ensure the correct Content-Types are set is: * not W3/WHATWG's problem. If web servers make adding Content-Type information hard, then web servers need to be updated to make it easier; I don't know about the W3C, but reality is the WHATWG's problem. We can't let things be broken and just say it's someone else's fault. We need to institute workarounds at our level for failures on other levels if that's what's necessary to get good security and a good user/author experience. * not really true, at least for Apache which can allow AddType et al in the .htaccess files that low-end shared hosts use. This may not be widely-known or practised, but that doesn't really merit changing the standards for everyone else to cope with. Creating a .htaccess file is a technical procedure that most users will not know how to do, particularly since the problem will probably just manifest itself as the video doesn't work. It's also not possible on some hosts -- although it's certainly possible on the large majority of cheap shared hosts, and of course on hosts where the author has root access. On Tue, Sep 7, 2010 at 6:52 AM, Philip Jägenstedt phil...@opera.com wrote: It hasn't been explicitly stated, but I assume that the only cases where sniffing for video formats would be employed would be for missing Content-Type, text/plain and application/octet-stream. If those are the only common MIME types incorrectly served for unknown file types, that seems reasonable. (Some files might be actively misidentified, like if I have an Ogg file saved as .jpeg, but hopefully this will be very rare.) On Tue, Sep 7, 2010 at 8:56 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 9/7/10 4:11 AM, Philip Jägenstedt wrote: It's garbage in at least UTF-8, Big5 and GBK. Thanks. I assume that applies to the OggS\0 sequence too, right? I appreciate the data! I'm not sure what infrastructure is in place, but perhaps one could *not* sniff if Content-Type also indicates an encoding? As long as indicates an encoding doesn't include UTF-8 or ISO-8859-1 (thanks, Apache!), that should be reasonable, I think. So at least for Ogg and WebM, how about: * Sniff only if Content-Type is typical of what popular browsers serve for unrecognized filetypes. E.g., only for no Content-Type, text/plain, or application/octet-stream, and only if the encoding is either not present or is UTF-8 or ISO-8859-1. Or whatever web servers do here. * Sniff the same both for video tags and top-level browsing contexts, so open video in new tab doesn't mysteriously fail on some setups. * If a file in a top-level browsing context is sniffed as video but then some kind of error is returned before the video plays the first frame, fall back to allowing the user to download it, or whatever the usual action would be if no sniffing had occurred. Within these constraints, false positives in the sniffing
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 3:29 PM, Aryeh Gregor wrote: * Sniff only if Content-Type is typical of what popular browsers serve for unrecognized filetypes. E.g., only for no Content-Type, text/plain, or application/octet-stream, and only if the encoding is either not present or is UTF-8 or ISO-8859-1. Or whatever web servers do here. * Sniff the same both for video tags and top-level browsing contexts, so open video in new tab doesn't mysteriously fail on some setups. I could probably live with those, actually. * If a file in a top-level browsing context is sniffed as video but then some kind of error is returned before the video plays the first frame, fall back to allowing the user to download it, or whatever the usual action would be if no sniffing had occurred. This might be pretty difficult to implement, since the video decoder might consume arbitrary amounts of data before saying that there was an error. -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 3:29 PM, Aryeh Gregor wrote: * Sniff only if Content-Type is typical of what popular browsers serve for unrecognized filetypes. E.g., only for no Content-Type, text/plain, or application/octet-stream, and only if the encoding is either not present or is UTF-8 or ISO-8859-1. Or whatever web servers do here. * Sniff the same both for video tags and top-level browsing contexts, so open video in new tab doesn't mysteriously fail on some setups. I could probably live with those, actually. * If a file in a top-level browsing context is sniffed as video but then some kind of error is returned before the video plays the first frame, fall back to allowing the user to download it, or whatever the usual action would be if no sniffing had occurred. This might be pretty difficult to implement, since the video decoder might consume arbitrary amounts of data before saying that there was an error. -Boris
Re: [whatwg] The choice of script global object to use when the script element is moved
On Tue, Sep 7, 2010 at 1:40 AM, Henri Sivonen hsivo...@iki.fi wrote: On Sep 3, 2010, at 20:55, Jonas Sicking wrote: On Fri, Sep 3, 2010 at 10:47 AM, Adam Barth w...@adambarth.com wrote: I'm not sure it makes much of a difference from a security point of view. Agreed. Pages can only move elements between pages that are in the same security context anyway so I can't really think of any attacks that any of the approaches would enable or disable. Suppose there are two docs from one Origin. The document that the parser is associated with doesn't have a CSP. A script in it moves a node in such a way that the parser ends up inserting subsequent scripts into another document. That document has a CSP that bans scripts. Would you consider it a bug if a script ran in the context of the script global object of the document whose CSP says no scripts? It sounds like CSP is creating sub-origin privileges. Sub-origin privileges don't really work, so it's unclear to what a sensible result would be. Adam
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, Sep 7, 2010 at 12:21 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 9/7/10 3:19 PM, Adam Barth wrote: It sadden me when standards bodies ignore reality and leave implementors to invent their own non-iteroperable algorithms for security-critical behavior. Of course nothing prevents us from saying UAs MUST NOT sniff but if they do anyway they MUST use a given algorithm, right? That's a contrary to duty imperative, which is something that's been puzzling philosophers for centuries. A more sensible requirement would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if they do, they MUST use a the following algorithm. Adam
Re: [whatwg] Video with MIME type application/octet-stream
Of course nothing prevents us from saying UAs MUST NOT sniff but if they do anyway they MUST use a given algorithm, right? That's a contrary to duty imperative, which is something that's been puzzling philosophers for centuries. A more sensible requirement would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if they do, they MUST use a the following algorithm. Except that in practice SHOULD NOT is treated as carte blanche to do the undesirable thing. It has no teeth. MUST NOT doesn't much either, but it's _something_ at least (in the sense that one can clearly claim that violating a MUST NOT is a bug). -Boris
Re: [whatwg] Video with MIME type application/octet-stream
On Tue, Sep 7, 2010 at 2:13 PM, Boris Zbarsky bzbar...@mit.edu wrote: Of course nothing prevents us from saying UAs MUST NOT sniff but if they do anyway they MUST use a given algorithm, right? That's a contrary to duty imperative, which is something that's been puzzling philosophers for centuries. A more sensible requirement would be that user agents SHOULD NOT sniff (for reasons XYZ), but, if they do, they MUST use a the following algorithm. Except that in practice SHOULD NOT is treated as carte blanche to do the undesirable thing. It has no teeth. MUST NOT doesn't much either, but it's _something_ at least (in the sense that one can clearly claim that violating a MUST NOT is a bug). In any case, lawyering the requirement level in the spec isn't the way to solve these problems. You need to change the underlying incentives to actually affect what gets implemented. Adam
[whatwg] ArrayBuffer and ByteArray questions
Hi, Several specs, like File API and WebGL, use ArrayBuffer, while other spec, like XMLHttpRequest Level 2, use ByteArray. Should we change to use the same name all across our specs? Since we define ArrayBuffer in the Typed Arrays spec ( https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/TypedArray-spec.html), should we favor ArrayBuffer? In addition, can we consider adding ArrayBuffer support to BlobBuilder, FormData, and XMLHttpRequest.send()? Thanks, Jian
Re: [whatwg] HTML6 Doctype
On 08/29/2010 08:00 AM, Tab Atkins Jr. wrote: On Sat, Aug 28, 2010 at 8:15 PM, David John Burrowes bain...@davidjohnburrowes.com wrote: I agree that they don't have access to versioning info from within the languages. But, CSS has some sense of versions (CSS, CSS2, and CSS3). This gives me some ability to say ah, SurfBrowser 1.0 and 2.0 supported CSS1, but with 3.0 they supported some of CSS2 etc etc. To be honest, no you can't. Not with such large labels, at least. You'll never be able to say X browser supports CSS3, but CSS3 isn't a thing. You can name individual modules only, which is equivalent to naming large features of HTML. How do you define a large feature of HTML? ~fantasai
Re: [whatwg] HTML6 Doctype
On Tue, Sep 7, 2010 at 4:45 PM, fantasai fantasai.li...@inkedblade.net wrote: On 08/29/2010 08:00 AM, Tab Atkins Jr. wrote: On Sat, Aug 28, 2010 at 8:15 PM, David John Burrowes bain...@davidjohnburrowes.com wrote: I agree that they don't have access to versioning info from within the languages. But, CSS has some sense of versions (CSS, CSS2, and CSS3). This gives me some ability to say ah, SurfBrowser 1.0 and 2.0 supported CSS1, but with 3.0 they supported some of CSS2 etc etc. To be honest, no you can't. Not with such large labels, at least. You'll never be able to say X browser supports CSS3, but CSS3 isn't a thing. You can name individual modules only, which is equivalent to naming large features of HTML. How do you define a large feature of HTML? Roughly, has a subheading in the TOC. Depending on the exact organization, this might actually be a heading or subsubheading. ~TJ
Re: [whatwg] Video with MIME type application/octet-stream
On 9/7/10 5:35 PM, Adam Barth wrote: In any case, lawyering the requirement level in the spec isn't the way to solve these problems. You need to change the underlying incentives to actually affect what gets implemented. The incentive structure for pretty much any sort of sniffing is a prisoner's dilemma. Life's hard. -Boris
[whatwg] Descendents of source and track elements should be skipped when serializing HTML fragment (10.3)
Hi, In HTML fragment serialization algorithm, we skip elements with empty content model in step 2.2: If current node is an areahttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-map-element.html#the-area-element , basehttp://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-base-element , basefonthttp://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#basefont , bgsoundhttp://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#bgsound , brhttp://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-br-element , colhttp://www.whatwg.org/specs/web-apps/current-work/multipage/tabular-data.html#the-col-element , embedhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-embed-element , framehttp://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#frame , hrhttp://www.whatwg.org/specs/web-apps/current-work/multipage/grouping-content.html#the-hr-element , imghttp://www.whatwg.org/specs/web-apps/current-work/multipage/embedded-content-1.html#the-img-element , inputhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html#the-input-element , keygenhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-keygen-element , linkhttp://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-link-element , metahttp://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#meta , paramhttp://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#the-param-element, or wbrhttp://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-wbr-elementelement, then continue on to the next child node at this point. For consistency, I propose to skip children of source and track elements as well. Also, the algorithm does not seem to specify the behavior on deprecated (or undocumented) elements such as isindex. Can we assume that the serialization of such elements are UA-defined? Best, Ryosuke Niwa Software Engineer rn...@webkit.org
Re: [whatwg] Descendents of source and track elements should be skipped when serializing HTML fragment (10.3)
The HTML parser expands the isindex element into a bunch of other elements, so it never inserts that element into the tree. Of course, an isindex element could have been inserted via the DOM... Adam On Tue, Sep 7, 2010 at 4:44 PM, Ryosuke Niwa ryosuke.n...@gmail.com wrote: Hi, In HTML fragment serialization algorithm, we skip elements with empty content model in step 2.2: If current node is an area, base, basefont, bgsound, br, col, embed, frame, hr, img, input, keygen, link, meta, param, or wbrelement, then continue on to the next child node at this point. For consistency, I propose to skip children of source and track elements as well. Also, the algorithm does not seem to specify the behavior on deprecated (or undocumented) elements such as isindex. Can we assume that the serialization of such elements are UA-defined? Best, Ryosuke Niwa Software Engineer rn...@webkit.org
Re: [whatwg] Timed tracks: feedback compendium
On Wed, Sep 8, 2010 at 11:19 AM, Ian Hickson i...@hixie.ch wrote: On Thu, 26 Aug 2010, Chris Double wrote: Firefox (in the case of video) uses file extensions to identify video files. We have an internal maping of file extensions to mime types. We don't sniff the content. I imagine we'd do the same with whatever file extension is used for WebSRT. (I assume this is only for the filesystem, not data from the wire!) Yes, this is only for the filesystem. Chris. -- http://www.bluishcoder.co.nz
[whatwg] Canvas API: What should happen if non-finite floats are used
Consider this testcase: !doctype html html body canvas id=c width=200 height=200/canvas script try { var c = document.getElementById(c), t = c.getContext(2d); t.moveTo(100, 100); t.lineTo(NaN, NaN); t.lineTo(50, 25); t.stroke(); } catch (e) {alert(e); } /script /body /html Behavior in the spec seems to be undefined (in particular, no mention is made as to what the canvas API functions are supposed to do if non-finite values are passed in). Behavior in browsers is: Presto: Throws NOT_SUPPORTED_ERR on that lineTo(NaN, NaN) call. Gecko: Throws DOM_SYNTAX_ERR on that lineTo(NaN, NaN) call. Webkit: Silently ignores the lineTo(NaN, NaN) call, and then draws a line from (100,100) to (50, 25). Seems like the spec needs to define this. -Boris P.S. This isn't a hypothetical issue; this came up in a page that was trying to graph things using canvas and ending up with divide-by-0 all over the place. It worked in webkit (though not drawing the right thing, so much). It failed to draw anything in Presto or Gecko.
Re: [whatwg] Canvas API: What should happen if non-finite floats are used
In 4.8.11.1 the spec does state: Except where otherwise specified, for the 2D context interface, any method call with a numeric argument whose value is infinite or a NaN value must be ignored. -Sam On Sep 7, 2010, at 9:41 PM, Boris Zbarsky wrote: Consider this testcase: !doctype html html body canvas id=c width=200 height=200/canvas script try { var c = document.getElementById(c), t = c.getContext(2d); t.moveTo(100, 100); t.lineTo(NaN, NaN); t.lineTo(50, 25); t.stroke(); } catch (e) {alert(e); } /script /body /html Behavior in the spec seems to be undefined (in particular, no mention is made as to what the canvas API functions are supposed to do if non-finite values are passed in). Behavior in browsers is: Presto: Throws NOT_SUPPORTED_ERR on that lineTo(NaN, NaN) call. Gecko: Throws DOM_SYNTAX_ERR on that lineTo(NaN, NaN) call. Webkit: Silently ignores the lineTo(NaN, NaN) call, and then draws a line from (100,100) to (50, 25). Seems like the spec needs to define this. -Boris P.S. This isn't a hypothetical issue; this came up in a page that was trying to graph things using canvas and ending up with divide-by-0 all over the place. It worked in webkit (though not drawing the right thing, so much). It failed to draw anything in Presto or Gecko.
Re: [whatwg] Canvas API: What should happen if non-finite floats are used
This seems like a strange choice of behavior. Given that this is very likely a bug in the program, wouldn't it make more sense to throw an exception as to make it easier to debug? Similar to for example Node.appendChild when called with a null argument. / Jonas On Tue, Sep 7, 2010 at 10:32 PM, Sam Weinig wei...@apple.com wrote: In 4.8.11.1 the spec does state: Except where otherwise specified, for the 2D context interface, any method call with a numeric argument whose value is infinite or a NaN value must be ignored. -Sam On Sep 7, 2010, at 9:41 PM, Boris Zbarsky wrote: Consider this testcase: !doctype html html body canvas id=c width=200 height=200/canvas script try { var c = document.getElementById(c), t = c.getContext(2d); t.moveTo(100, 100); t.lineTo(NaN, NaN); t.lineTo(50, 25); t.stroke(); } catch (e) {alert(e); } /script /body /html Behavior in the spec seems to be undefined (in particular, no mention is made as to what the canvas API functions are supposed to do if non-finite values are passed in). Behavior in browsers is: Presto: Throws NOT_SUPPORTED_ERR on that lineTo(NaN, NaN) call. Gecko: Throws DOM_SYNTAX_ERR on that lineTo(NaN, NaN) call. Webkit: Silently ignores the lineTo(NaN, NaN) call, and then draws a line from (100,100) to (50, 25). Seems like the spec needs to define this. -Boris P.S. This isn't a hypothetical issue; this came up in a page that was trying to graph things using canvas and ending up with divide-by-0 all over the place. It worked in webkit (though not drawing the right thing, so much). It failed to draw anything in Presto or Gecko.