[whatwg] Update on fallback encoding findings
A while ago, Hixie pinged me on IRC to ask if there are any news about the character encoding stuff. While there are no news yet about guessing the fallback encoding from the TLD of the site, there are now some news about guessing the fallback encoding from the locale. Data for Firefox 25: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393 Data for Firefox 26: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394 Data for Firefox 27: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8420031 Data for Firefox 28: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8420032 Specific findings: 1) Prior to Firefox 28, Traditional Chinese Firefox had a bug that caused the fallback to be UTF-8. Changing the fallback to Big5 in Firefox 28 reduced the usage of the Character Encoding menu. (Please note, however, that Firefox's notion of Big5 does not yet comply with the Encoding Standard notion of Big5.) 2) Prior to Firefox 28, Thai Firefox had a bug that caused the fallback to be windows-1252. Changing the fallback to windows-874 in Firefox 28 reduced the usage of the Character Encoding menu. There were also other locales that had their fallback corrected per spec in Firefox 28. However, for those locales, the changes were within the variation seen between releases previously. I think the finding about Traditional Chinese supports the conclusion that we should not fall back to UTF-8 everywhere. I think the finding about Thai supports a conclusion that we should not fall back on windows-1252 everywhere. However, the results being in the noise for some locales that had their fallback changes suggest that the labeling practice isn't uniform around the world and some locales are relying on the fallback less than others. Since locales using a non-Latin script are the leaders in Character Encoding menu use even when there's only one dominant legacy encoding within the locale, it seems that there is a continued tension between the locale-specific fallback and fallback to windows-1252. Guessing the fallback from the TLD is supposed to address this. I will report findings once the TLD guessing has been on the release channel for six weeks. Also, the relatively high level of Character Encoding menu use for the Korean locale continues to puzzle me. From looking at the mere structure of the legacy or the neighboring locales being different, one should expect the situation with the Korean locale and the Hebrew locales to be very similar. Yet, it is not. Finally worth noting: Firefox is committing a willful violation of the spec when it comes to Simplified Chinese: The spec says gb18030, but Firefox uses gbk. Starting with Firefox 29, the gbk *decoder* will be the same as the gb18030 decoder. However, because we've previously seen problems with EUC-JP and Big5 when expanding the range of byte sequences that an *encoder* can produce in form submission, we are keeping the gbk encoder distinct from the gb18030 at least for now. I'm willing to reconsider if another browser (that has high market share in China) successfully starts using the gb18030 encoder for form submissions for sites that declare gbk (or gb2312) or don't declare an encoding. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/
Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization
On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote: What have you learnt so far? I've learned that I've misattributed the cause of high frequency of character encoding menu usage in the case of the Traditional Chinese localization. We've been shipping after the wrong fallback encoding (UTF-8) even after the fallback encoding was supposedly fixed (to Big5). Shows what kind of a mess our previous mechanism for setting the fallback encoding in a locale-dependent way was. The fallback encoding for Traditional Chinese will change to Big5 for real in Firefox 28. I might have improved (hopefully; to be seen still) Firefox for the wrong reason. Oops. :-) Also, more baseline telemetry data (i.e. data without TLD-based guessing) is now available. The last 3 weeks of Firefox 25 on the release channel: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393 . The last 3 weeks of Firefox 26 on the release channel: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394 . The rows for locales with such little usage overall but even a couple of sessions with the encoding menu use puts them of the list percentage-wise are grayed. In both cases, the top entries in black are Traditional Chinese and Thai, both of which have the wrong fallback. Up next are CJK followed by the Cyrillic locales that have a detector on by default (Russian and Ukrainian), which makes one wonder if the detectors are doing more harm than good. Up next is Arabic, which has the wrong fallback. (These wrong fallbacks are fixed in Firefox 28. In Firefox 28, no locale falls back to UTF-8.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/
Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization
On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote: The correlation should be at least as high, as far as I can tell. Logically, yes, for most parts of the world. Or maybe a 50%/50% experiment with that as the first 50% and the default coming from the TLD instead of the UI locale in the second 50%, with the corresponding instrumentation, to see how the results compare. Mozilla doesn't have a proper A/B testing infrastructure yet. I expect the A to be Firefox 29 on the release channel and B to be Firefox 30 on the release channel. So unless this gets backed out, I expect to have data around the time of Firefox 31 going to release. Have you tried deploying this? It is on Firefox trunk now. However, not all country TLDs are participating. I figured it is better to leave unsure cases the way they were. It doesn't make sense to put a lot of effort into researching those before seeing if the general approach works for the case that it was designed for, specifically Traditional Chinese. The success metric I expect to be looking at is if the usage of the character encoding menu in the Traditional Chinese localization of Firefox falls to the same level as in other Firefox localizations in general. If this change turns out to be successful for Traditional Chinese, then I think it will be worthwhile to research the unobvious cases. The TDLs listed in https://mxr.mozilla.org/mozilla-central/source/dom/encoding/nonparticipatingdomains.properties do not participate at present (i.e. get a browser UI localization-based guess like before). The TLDs listed in https://mxr.mozilla.org/mozilla-central/source/dom/encoding/domainsfallbacks.properties get the fallbacks listed in that file. All other TLDs map to windows-1252. What have you learnt so far? It hasn't been an obvious and immediate disaster. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/
[whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization
-8 is never guessed, so this feature doesn't give anyone who uses UTF-8 a reason not to declare it. In that sense, this feature doesn't interfere with the authoring methodology sites should, ideally, be adhering to. # How could this be harmful? * This could emphasize pre-existing brokenness (failure to declare the encoding) of sites targeted at language minorities when the legacy encoding for the minority language doesn't match the legacy encoding for the majority language of the country and 100% of the target audience of the site uses a browser localization that matches the language of the site. For example, it's *imaginable* that there exists a Russian-language windows-1251-encoded (but not declared) site under .ee that's currently always browsed with Russian browser localizations. More realistically, minority-language sites whose encoding doesn't match the dominant encoding of the country probably can't be relying on their audience using a particular browser localization and are probably more aware than most about encoding issues and already declare their encoding, so I'm not particularly worried about this scenario being a serious problem. And sites can always fix things by declaring their encoding. * This could cause some breakage when unlabeled non-windows-1252 sites are hosted under a foreign TLD, because the TLD looks cool (e.g. .io). However, this is a relatively new phenomenon, so one might hope that there's less content authored according to legacy practices involved. * This probably lowers the incentive to declare the legacy encoding a little. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/
[whatwg] Add an attribute for opting out of synchronous CSSOM access
For problem statement, please see http://lists.w3.org/Archives/Public/www-style/2013Jan/0434.html For solution, please see http://lists.w3.org/Archives/Public/www-style/2013Jan/0457.html For CSS WG thinking that this is an HTML issue, please see http://lists.w3.org/Archives/Public/www-style/2013Mar/0688.html (FWIW, I think this is a CSS issue than requires an HTML attribute to be minted.) Please add an attribute to link that: * opts an external style sheet out of synchronous CSSOM access * makes the sheet not load and not defer the load event if its media query cannot match in the UA even after zooming or invoking a print function * makes the sheet load with low priority and not defer the load event if its media query does not match at the time the link element is inserted into the document but might match later (e.g. if it's a print style sheet). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason
In various places that deal with encoding labels, the HTML spec now requires authors to use the name of the encoding from the Encoding Standard, which means using the preferred name rather than an alias. Compared to the previous reference to the IANA registry, some names that work in all browsers but are no longer preferred names are now errors, such as iso-8859-1 and tis-620. Making broadly-supported names that were previously preferred names according to IANA now be errors does not appear to provide any utility to Web authors who use validators. Please relax the requirement so that at least previously-preferred names are not errors. zcorpan suggested (http://krijnhoetmer.nl/irc-logs/whatwg/20130325#l-920) allowing non-preferred names for non-UTF-8 encodings. I'm not familiar with the level of browser support for all of the non-preferred aliases, but I could accept zcorpan's suggestion. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] menu and friends
On Wed, Jan 9, 2013 at 10:17 PM, Ian Hickson i...@hixie.ch wrote: Optimising for the short-term shim author's experience rather than the long-term HTML authoring experience seems backwards to me. After input from a couple of other Gecko developers, I withdraw my objection to menuitem being void. As for command behavior in the parser, all major browsers have shipped releases with command as void, so we won't be able to reliably introduce a non-void element called command in the future anyway. Therefore, I don't see value in removing the voidness of command from parsing or serialization. The element doesn't exist, so there's no value in having it. We can easily introduce a non-void command in ten years if we need to, since by then the current parsers will be gone. Even if we accept, for the sake of the argument, that the current parsers will be gone in 10 years, it is incorrect to consider only parsers. Considering serializers is also relevant. The voidness of command has already propagated to various places—including serializer specs like http://www.w3.org/TR/xslt-xquery-serialization-30/ . (No doubt the XSLT folks will be super-happy when we tell them that the list of void elements has changed again.) At any point of the future, it is more likely that picking a new element name for a newly-minted non-void element will cause less (maybe only an epsilon less but still less) hassle than trying to re-introduce command as non-void. Why behave as if finite-length strings were in short supply? Why not treat command as a burned name just like legend and pick something different the next time you need something of the same theme when interpreted as an English word? What makes an element exist for you? Evidently, basefont and bgsound exist enough to get special parsing and serialization treatment. Is multiple interoperable parsing and serialization implementations not enough of existence and you want to see deployment in existing content, too? Did you measure the non-deployment of command on the Web or are we just assuming it hasn't been used in the wild? Even if only a few authors have put command in head, changing parsing to make command break out of head is bad. What do we really gain except for test case churn, makework in code and potential breakage from changing command as opposed to treating it as a used-up identifier and minting a new identifier in the future if a non-void element with a command-like name is needed in the future? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] We should not throw DOM Consistency and Infoset compatibility under the bus
On Fri, Jan 11, 2013 at 10:00 PM, Ian Hickson i...@hixie.ch wrote: On Fri, 11 Jan 2013, Henri Sivonen wrote: I understand that supporting XML alongside HTML is mainly a burden for browser vendors and I understand that XML currently doesn't get much love from browser vendors. Not just browser vendors. Authors rarely if ever use XML for HTML either. When you say use XML, do you mean serving content using an XML content type? I'm talking about serving text/html but using XML machinery to generate it (with a text/html-aware serializer at the end of the process). Still, I think that as long as browsers to support XHTML, we'd be worse off with the DOM-and-above parts of the HTML and XML implementations diverging. Sure, but if on the long term, or even medium term, they don't continue to support XHTML, this is no longer a problem. But if they do continue to support XHTML, introducing divergence will be a problem and, moreover, a problem that may become unfixable. (That we were able to converge on the namespace was narrow enough a success. It broke Facebook!) Anyway, I'm not suggesting that they diverge beyond the syntax (which is already a lost cause). All I've concretely proposed is syntax for binding Web components in text/html; I haven't described how this should be represented in the DOM, for instance. If we define foo/bar as being a text/html syntactic shorthand for foo xml:component=bar, or foo xmlcomponent=bar, in much the same way as we say that svg is a shorthand for svg xmlns=http://www.w3.org/2000/svg;, then the DOM remains the same for both syntaxes, and (as far as I can tell) we're fine. I didn't realize you were suggesting that HTML parsers in browsers turned bar/foo into bar xml:component=foo in the DOM. How is xml:component=foo better than is=foo? Why not bar foo=, which is what bar/foo parses into now? (I can think of some reasons against, but I'd like to hear your reasons.) The idea to stick a slash into the local name of an element in order to bind Web Components is much worse. I don't propose to change the element's local name. select/map has tagName select in my proposal. Oh. That was not at all clear. Please, let's not make that mistake. What do you propose to resolve this problem then? Let's focus on the requirements before proposing solutions. Some of the constraints are: - The binding has to be done at element creation time - The binding has to be immutable during element lifetime - The syntax must not make authors think the binding is mutable (hence why the select is=map proposal was abandoned) “Was abandoned”? Already “abandoned”? Really? How does xml:component=map suggest mutability less than is=map? Would it be terrible to make attempts to mutate the 'is' attribute throw thereby teaching authors who actually try to mutate it that it's not mutable? - The syntax must be as terse as possible - The syntax has to convey the element's public semantics (a specified HTML tag name) in the document markup, for legacy UAs and future non-supporting UAs like spiders. - It must be possible to generate the syntax using a serializer that exposes (only) the SAX2 ContentHandler interface to an XML system and generates text/html in response to calls to the methods of the ContentHandler interface and the XML system may enforce the calls to ContentHandler representing a well-formed XML document (i.e. would produce a well-formed XML doc if fed into an XML serializer). The syntax must round-trip if the piece of software feeding the serializer is an HTML parser that produces SAX2 output in a way that's consistent with the way the parsing spec produces DOM output. (This is a concrete way to express “must be producable with Infoset-oriented systems without having a different Infoset mapping than the one implied by the DOM mapping in browsers”. As noted, dealing with template already bends this requirement but in a reasonably straightforward way.) - It must be possible to generate the syntax with XSLT. (Remember, we already have !DOCTYPE html SYSTEM about:legacy-compat, because this is important enough a case.) Adding these requirements to your list of requirements may make the union of requirements internally contradictory. However, I think we should have a proper discussion of how to reconcile contradictory requirements instead of just conveniently trimming the list of requirements to fit your proposed solution. (For example, it could be that one of my requirements turns out to be more important than one of yours.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] We should not throw DOM Consistency and Infoset compatibility under the bus
as HTML fits into the XML data model. I think it would be a mistake to change HTML in such a way that it would no longer fit into the XML data model *as implemented* and thereby limit the range of existing software that could be used outside browsers for working with HTML just because XML in browsers is no longer in vogue. Please, let's not make that mistake. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] menu and friends
On Sat, Dec 29, 2012 at 3:23 AM, Ian Hickson i...@hixie.ch wrote: * menuitem is void (requires parser changes). * command is entirely gone. (Actually, I renamed command to menuitem and made it so it's only allowed in menu.) Did you actually make these changes to the parsing algorithm? It seems to me that you didn't, and I'm happy that you didn't. Currently, menuitem is non-void in Firefox. It was initially designed to be void but that never shipped and the non-voidness is, AFAIK, considered intentional. For one thing, being non-void makes the element parser-neutral and, therefore, easier to polyfill in menuitem-unaware browsers. As for command behavior in the parser, all major browsers have shipped releases with command as void, so we won't be able to reliably introduce a non-void element called command in the future anyway. Therefore, I don't see value in removing the voidness of command from parsing or serialization. Could you, please, revert the serializing algorithm to treat command as void and menuitem as non-void? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote: the order between abc and xyz is reversed in the tree. Does anyone have any preference for how this is fixed? Does it need to be fixed? That is, is it breaking real sites? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] [mimesniff] Sniffing archives
On Tue, Dec 4, 2012 at 9:40 AM, Adam Barth w...@adambarth.com wrote: Also, some user agents treat downloads of ZIP archives differently than other sorts of download (e.g., they might offer to unzip them). Which user agents? For this use case, merely sniffing for the zip magic number is inadequate, because you really don’t want to offer to unzip EPUB, ODF, OOXML, XPS, InDesign, etc. files. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Loading and executing script as quickly as possible using multipart/mixed
On Tue, Dec 4, 2012 at 4:15 AM, Kyle Simpson get...@gmail.com wrote: One suggestion is to added a state to the readyState mechanism like chunkReady, where the event fires and includes in its event object properties the numeric index, the //@sourceURL, the separator identifier, or otherwise some sort of identifier for which the author can tell which chunk executed. If the script author needs to manually designate the chunk boundaries, can’t the script authors insert a call to a function before each boundary? That is, why is it necessary for the UA to generate events? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] [mimesniff] Handling container formats like Ogg
On Tue, Nov 27, 2012 at 12:59 AM, Gordon P. Hemsley gphems...@gmail.com wrote: Container formats like Ogg can be used to store many different audio and video formats, all of which can be identified generically as application/ogg. Determining which individual format to use (which can be identified interchangeably as the slightly-less-generic audio/ogg or video/ogg, or using a 'codecs' parameter, or using a dedicated media type) is much more complex, because they all use the same OggS signature. It would requiring actually attempting to parse the Ogg container to determine which audio or video format it is using (perhaps not unsimilar to what is done for MP4 video and what might have to be done with MP3 files without ID3 tags). Would this be something UAs would prefer to handle in their Ogg library, or should I spec it as part of sniffing? What would be the use case for handling it as part of sniffing layer? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] main element parsing behaviour
On Wed, Nov 7, 2012 at 2:42 PM, Simon Pieters sim...@opera.com wrote: I think we shouldn't put the parsing algorithm on a pedestal while not giving the same treatment to the default UA style sheet or other requirements related to an element that have to be implemented. The difference between the parsing algorithm on the UA stylesheet is that authors can put display: block; in the author stylesheet during the transition. That said, the example jgraham showed to me on IRC convinced me that if main is introduced to the platform, it makes sense to make it parse like article. :-( (I’m not a fan of the consequences of the “feature” of making /p optional. Too bad that feature is ancient and it’s too late on undo it.) I guess I’ll focus on objecting to new void elements and especially to new children of head. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] maincontent element spec updated and supporting data provided
On Wed, Oct 17, 2012 at 3:03 AM, Steve Faulkner faulkner.st...@gmail.com wrote: I have updated the maincontent spec [1] and would appreciate any feedback (including, but not limited to implementers). bikeshedA single-word element name would me more consistent with other HTML element names. content would be rather generic, so I think main would be the better option./bikeshed It would probably make sense to add main { display: block; } to the UA stylesheet. If Hixie had added this element in the same batch as section, article and aside, he would have made the parsing algorithm similarly sensitive to this element. However, I'm inclined to advise against changes to the parsing algorithm at this stage (you have none; I am mainly writing this for Hixie), since it would move us further from a stable state for the parsing algorithm and, if the main element is used in a conforming way, it won't have a p element preceding it anyway. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Archive API - proposal
On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard gl...@zewt.org wrote: On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini b...@mozilla.com wrote: // The getFilenames handler receives a list of DOMString: var handle = this.reader.getFile(this.result[i]); This interface is problematic. Since ZIP files don't have a standard encoding, filenames in ZIPs are often garbage. This API requires that filenames round-trip uniquely, or else files aren't accessible t all. Indeed, in the case of zip files, file names themselves are dangerous as handles that get past passed back and forth, so it seems like a good idea to be able to extract the contents of a file inside the archive without having to address the file by name. As for the filenames, after an off-list discussion, I think the best solution is that UTF-8 is tried first but the ArchiveReader constructor takes an optional second argument that names a character encoding from the Encoding Standard. This will be known as the fallback encoding. If no fallback encoding is provided by the caller of the constructor, Windows-1252 is set as the fallback encoding. When it ArchiveReader processes a filename from the zip archive, it first tests if the byte string is a valid UTF-8 string. If it is, the byte string is interpreted as UTF-8 when converting to UTF-16. If the filename is not a valid UTF-8 string, it is decoded into UTF-16 using the fallback encoding. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] [selectors4] drag-and-drop pseudo-classes
On Aug 14, 2012 10:54 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: On Tue, Aug 14, 2012 at 12:13 PM, Ryosuke Niwa rn...@webkit.org wrote: Yeah, and that's not compatible with how drag and drop are implemented on the Web. I know. You'll notice that I didn't suggest we somehow change to that. ^_^ However, other languages might want this kind of model, Other languages?
Re: [whatwg] Was is considered to use JSON-LD instead of creating application/microdata+json?
On Fri, Aug 10, 2012 at 1:39 PM, Markus Lanthaler markus.lantha...@gmx.net wrote: Well, I would say there are several advantages. First of all, JSON-LD is more flexible and expressive. More flexible and expressive than what? Than application/microdata+json. That's a problem right there. It means that JSON-LD requires more consumer complexity than application/microdata+json. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Features for responsive Web design
On Fri, Aug 10, 2012 at 11:54 AM, Florian Rivoal flori...@opera.com wrote: I wasn't debating whether or not shipping a device with a 1.5 pixel ratio is the best decision, but answering: Is there a good reason to believe that will be something other than a power of two? The fact that it has happened seems a pretty good reason to believe that it may happen. These are different questions: Will someone ship a browser/device combination whose device pixel ratio is something other than 1 or 2? Will Web authors bother to supply bitmaps with sampling factors other than 1 and 2? As a data point worth considering, for desktop apps on OS X Apple makes developers supply bitmap assets for 1x and 2x and if the user chooses a ratio between 1 and 2, the screen is painted at 2x and the resulting bitmap is scaled down. Another thing worth considering is if ever anyone is really going to go over 2x, given that at normal viewing distances 2x is roughly enough to saturate the resolution of the human eye (hence the retina branding). Even for printing photos, 192 pixels per inch should result in very good quality, and for line art, authors should use SVG instead of bitmaps anyway. If it indeed is the case that there are really only two realistic bitmaps samplings for catering to differences in weeding device pixel density (ignoring art direction), it would make sense to have simply img src=1xsampling.jpg hisrc=2xsampling.jpg alt=Text alternative instead of an in-attribute microsyntax for the non-art-directed case. Ian Hickson wrote: On Wed, 16 May 2012, Henri Sivonen wrote: It seems to me that Media Queries are appropriate for the art-direction case and factors of the pixel dimensions of the image referred to by src= are appropriate for the pixel density case. I'm not convinced that it's a good idea to solve these two axes in the same syntax or solution. It seems to me that srcset= is bad for the art-direction case and picture is bad for the pixel density case. I don't really understand why They are conceptually very different: One is mere mipmapping and can be automatically generated. The other involves designer judgment and is conceptually similar to CSS design where authors use MQ. Also, having w and h refer to the browsing environment and x to the image in the same microsyntax continues to be highly confusing. Ignoring implementation issues for a moment, I think it would be conceptually easier it to disentangle these axes like this: Non-art directed: img src=1xsampling.jpg hisrc=2xsampling.jpg alt=Text alternative Art directed: picture source src=1xsampling-cropped.jpg hisrc=2xsampling-cropped.jpg media=max-width: 480px img src=1xsampling.jpg hisrc=2xsampling.jpg alt=Text alternative /picture -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] register*Handler and Web Intents
On Fri, Aug 3, 2012 at 12:00 PM, James Graham jgra...@opera.com wrote: I agree with Henri that it is extremely worrying to allow aesthetic concerns to trump backward compatibility here. Letting aesthetic concerns trump backward compat is indeed troubling. It's also troubling that this even needs to be debated, considering that we're supposed to have a common understanding of the design principles and the design principles pretty clearly uphold backward compatibility over aesthetics. I would also advise strongly against using position in DOM to detect intents support; if you insist on adding a new void element I will strongly recommend that we add it to the parser asap to try and mitigate the above breakage, irrespective of whether our plans for the rest of the intent mechanism. I think the compat story for new void elements is so bad that we shouldn't add new void elements. (source gets away with being a void element, because the damage is limited by the /video or /audio end tag that comes soon enough after source.) I think we also shouldn't add new elements that don't imply body when appearing in in head. It's great that browsers have converged on the parsing algorithm. Let's not break what we've achieved to cater to aesthetics. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] alt= and the meta name=generator exception
On Wed, Aug 1, 2012 at 10:56 AM, Ian Hickson i...@hixie.ch wrote: After all, what's the point of using validation if you use a generator? People who are not the developer of the generator use validators to assess the quality of the markup generated by the generator. You would in effect be testing the generator, something that its vendor should have done. We should not be concerned about helping generator vendors to advertize their products as producing valid code (code that passes validation) when they in fact produce code that violates established good practices of HTML. Alice writes a generator that's logically cannot know the text alternative for an image file and, therefore, makes the generator output img without alt. Bob is shopping around for generators of the type Alice's generator happens to be or engaging in an Internet argument about which generator sucks and which generator rocks. So Bob feeds the output of Alice's generator to validator, sees an error message and proceeds to proclaim to the world that Alice's generator is bad, because it's output doesn't validate. Alice doesn't want Bob to proclaim to the world that her generator is bad. Educating Bob and everyone who listens to Bob about why the generator produces output that causes the validation error is too hard. The path of least resistance for Alice to make the problem go away is to change the output of the generator such that it doesn't result in an error message from a validator, so Alice makes the image have the attribute alt= which happens to result in the existence of the image being concealed from users of screen readers. Or, alternatively, Alice anticipates Bob's reaction and preemptively makes her generator output alt= before Bob ever gets to badmouth about the invalidity of the generator's output. Even if we wanted to position validators as tools for the people who write markup, we can't prevent other people from using validators to judge markup output by generator written by others. The crux of this problem is the tension between a validator as a tool for the person writing the markup and a validator being used to judge someone else's markup and what people can most easily do to evade such judgment. We briefly brainstormed some ideas on #whatwg earlier tonight, and one name in particular that I think could work is the absurdly long img src=... generator-unable-to-provide-required-alt= This has several key characteristics that I think are good: - it's long, so people aren't going to want to type it out - it's long, so it will stick out in copy-and-paste scenarios - it's emminently searchable (long unique term) and so will likely lead to good documentation if it's adopted - the generator part implies that it's for use by generators, and may discourage authors from using it - the unable and required parts make it obvious that using this attribute is an act of last resort While I agree with the sentiment the name of the attribute communicates, its length is enough of a problem to probably make it fail: 1) Like a namespace URL, it's too long to memorize correctly, so it's easier for the generator developer to type 'alt' than to copy and past the long attribute name from somewhere. 2) It takes so many more bytes than alt=, so it's easy to shy away from using it on imagined efficiency grounds. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] alt= and the meta name=generator exception
On Sat, Aug 4, 2012 at 9:08 AM, Michael[tm] Smith m...@w3.org wrote: Agreed. I support making having some kind of trial period like what you describe, or a year or two or 18 months. If we do that I would prefer that the spec include some kind of note/warning making it clear that the attribute is experimental and may be dropped or changed significantly within the next two years based on analysis we get back during that time. There's a non-trivial set of validator users who get very upset if the validator says that the document that previously produced no validation errors now produces validation errors--even if the new errors result from a bug fix. In my experience, handing out badges makes people more upset if the criteria behind the badge changes, but even without badges, it seems to me that the sentiment is there. Therefore, if you tell people that if they use a particular syntax their document might become invalid in the future, chances are that they will steer clear of the syntax when an easier alternative is available--just writing alt=. So adding a warning that the syntax is experimental is an almost certain way to affect the outcome of the experiment. On the other hand, not warning people and then changing what's valid is likely to make people unhappy. It seems to me that running an experiment like this will either result in a failed experiment, unhappy people or both. If an experiment on this topic was to be run, what would you measure how would you interpret the measurements? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Linters should complain about missing accessible names for controls [Was: Re: alt= and the meta name=generator exception]
On Sat, Aug 4, 2012 at 10:32 PM, Benjamin Hawkes-Lewis bhawkesle...@googlemail.com wrote: Would it be possible to combine this with the linter complaining about all controls (links, buttons, form fields) have markup that yield a non-empty accessible name without invoking repair techniques such as reading filenames without img @src attributes? Given a well-defined algorithm for finding the accessible name for links, buttons and form fields, I think it would make sense for a validator to be able to complain when the algorithm results in an empty accessible name. Whether that should be a validity constraint or an optional additional check is a bit tricky, for the same reason why we allow empty paragraphs and empty lists: to let markup editors simultaneously guarantee the validity of their output and to allow the user to save the document at any stage of editing. (Again, there's tension between different uses of validity: the sort of validity constraints you want to hold before and after each discrete editing operation and constraints you want to hold when the document is done.) http://www.w3.org/WAI.new/PF/aria/roles#namecalculation Spec writing that puts a point starting with Authors MAY under The text alternative for a given node is computed as follows: is sad-making. :-( I realise the author requirements in the HTML spec seem to have gradually become very forgiving here, not really sure why. :( To avoid e.g. the insertion of an nbsp; in each newly-created p/p in an editor to avoid violating a ban on empty paragraphs. Validity constraints have unintended consequences. The cases where markup generators cannot provide a better control name than _nothing_ seem to me much rarer than the cases where markup generators cannot provide better text alternatives for photos etc - maybe even non-existent - and when hand-authoring describing a control is even easier than coming up with a text equivalent for a graphic. Yeah. In this case, the problem isn't non-interactive generators but interactive editors. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Load events fired during onload handlers
For what it's worth, I think the weirdness described in this thread is a good reason not to try to make DOMContentLoaded consistent with the load event for the sake of consistency. For one thing, the code that manages the weirdness of the load event lives in a different place compared to the code that fires DOMContentLoaded. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] register*Handler and Web Intents
On Thu, Jul 26, 2012 at 5:20 AM, Ian Hickson i...@hixie.ch wrote: Thus, I propose a parallel mechanism in the form of an empty element that goes in the head: intent action=edit intent action, e.g. open or edit, default share type=image/png MIME type filter, default omitted, required if scheme omitted scheme=mailto Scheme filter, default omitted, required if type omitted href= Handler URL, default (current page) title=Foo Handler user-visible name, required attribute disposition=HandlerDisposition values, default overlay This is a severe violation of the Degrade Gracefully design principle. Adopting your proposal would mean that pages that include the intent element in head would parse significantly differently in browsers that predate the HTML parsing algorithm or in browsers that implement it in its current form. I believe that having the intent element break the parser out of head in browsers that don't contain the parser differences you implicitly propose would cause a lot of grief to Web authors and would hinder the adoption of this feature. My concerns could be addressed in any of these three ways: 1) Rename intent to link 2) Rename intent to meta 3) Make intent have an end tag and make it placed in body rather than head I prefer solution #1. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] alt and title attribute exception
On Tue, Jul 31, 2012 at 12:18 PM, Philip Jägenstedt phil...@opera.com wrote: When this was last discussed in the HTML WG (January 2012) I opened a bug (MOBILE-275) for Opera Mobile to expose the title attribute in our long-click menu, arguing that one could not enjoy XKCD without it. I meant to report back to the HTML WG but forgot, so here it is. Unfortunately, the bug was rejected... quoting the project management: Sure it is nice to have, but noone else has it so we will not put our effort into this Firefox for Android (at least on the Nightly channel) displays the content of the title attribute on XKCD comics (up to a length limit which can often be too limiting) upon tap and hold: http://hsivonen.iki.fi/screen/xkcd-firefox-for-android.png Not to suggest that XKCD's title usage is OK but just to correct the noone else bit. it seems unwise to recommend using the title attribute to convey important information. Indeed. In addition to image considerations, I think http://www.whatwg.org/specs/web-apps/current-work/#footnotes is bad advice. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] alt= and the meta name=generator exception
On Tue, Jul 24, 2012 at 10:58 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote: This is an improvement, but I think Edward O'Connor's points still apply. Indeed. The spec edit is a rather disappointing response. I think it would be better to keep the alt attribute always required but recommend that conformance checkers have an option of switching off errors related to this The big question is whether that would be enough to solve the problem of generators generating bogus alts in order to pass validation. I predict generator writers would want the generator output to pass validation with the default settings and, therefore, what you suggest wouldn't fix the problem that the spec is trying to fix. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Proposal for readyState behavior
On Tue, Jul 10, 2012 at 10:15 PM, Ian Hickson i...@hixie.ch wrote: Done. Thanks. 4) Whenever a transition to interactive is made, DOMContentLoaded must eventually get fired later if the document stays in a state where events can fire on it. Rationale: * This seems sensible for consistency with the common case. Currently, there are cases where Firefox fires DOMContentLoaded without a transition to interactive or transitions to interactive without ever firing DOMContentLoaded, but these cases are inconsistent with other browsers, so it's hard to believe they are well-considered compatibility features. Delta from the spec: Same as for point 3. Disagreed. IMHO DOMContentLoaded is equivalent to 'load', just a bit earlier (it's basically 'load' but before the scripts have run). In fact, I'd specifically define DOMContentLoaded as meaning the DOM content was completely loaded, which clearly can't happen if the parser aborted. Could you please leave your sense of logic at the door instead of rocking the interop boat like this? Personally, I'm already spending way more than enough time in this quagmire of trying to sort out events and readyStates with abnormal document loads that I have about zero interest in making Gecko not fire an event in a situation where Firefox, IE10 and Opera currently fire it. Furthermore, I think that in a situation like this change is more harmful and likely to break something than the sort of logic you offered is useful. 10) XSLT error pages don't count as aborts but instead as non-aborted loads of the error page. Rationale: * Makes parent pages less confused about events they are waiting. * Already true except for bugs in Firefox which is the only browser with XSLT error pages. Delta from the spec: Make explicit in spec. I haven't defined this because to define this I'd have to define a ton of infrastructure that explains how XSLT works in the first place, and I'm still waiting for the XSLT community to write the tests that demonstrate what the requirements should be: https://www.w3.org/Bugs/Public/show_bug.cgi?id=14689 I don't think you need to spec infrastructure to define a high-level expectation that loads with XSLT errors are supposed to finish as if they were successful loads rather than aborted loads. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Readiness of script-created documents
On Tue, Jun 12, 2012 at 1:46 AM, Ian Hickson i...@hixie.ch wrote: When a document is aborted the state is more or less left exactly as it was when it was aborted. This includes the readiness state. It also means no events fire (e.g. no 'load', 'unload', or 'error' events), a number of scripts just get abandoned without executing, appcache stuff gets abandoned, queued calls to window.print() get forgotten, etc. Aborting a document is a very heavy-handed measure. Documents are not expected to last long after they have been aborted, typically. Pages aren't expected to remain functional beyond that point. That's not reality in all browsers right now, and I think it doesn't make sense to make that the reality. That is, there already browsers that transition readyState to complete upon aborting the parser and I think doing that makes sense (and I want to change Gecko to do that, too), because a non-complete readyState is a promise to fire an load event later. I think it's a bad idea to leave a document into the loading state when the browser engine knows that it won't fire and load event for the document. Basically, I think the platform should maximize the chances of the following code pattern causing doStuff to run once the document has completely loaded: if (document.readyState == complete) { setTimeout(doStuff, 0); } else { document.addEventListener(load, doStuff); } -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior
On Thu, Jun 7, 2012 at 2:47 AM, Ian Hickson i...@hixie.ch wrote: On Fri, 27 Jan 2012, Boris Zbarsky wrote: On 1/27/12 1:30 AM, Ian Hickson wrote: On Wed, 5 Oct 2011, Henri Sivonen wrote: On Tue, Oct 4, 2011 at 9:54 PM, Boris Zbarskybzbar...@mit.edu wrote: What Firefox does do is block execution ofscript tags (but not timeouts, callbacks, etc!) if there are pending non-altenate parser-inserted stylesheet loads. This is necessary to make sure that scripts getting layout properties see the effect of those stylesheets. A side-effect is that ascript coming after alink will never see the link in an unloaded state... unless there's a network error for thelink or whatever. One exception: If an inline script comes from document.write(), it doesn't block on pending sheets. It runs right away. If it blocked on pending sheets, the point at which document.write() returns would depend on network performance, which I think would be worse than having document.written inline scripts that poke at styles fail depending on network performance. Note that this is not conforming. The spec does not currently define any such behaviour. Which part is not conforming? The exception for alternate sheets, the inline script inside document.write thing, or something else? Unless I'm mistaken, nothing in the HTML spec does anything differently based on whether a script comes from document.write() or not. I think that's a spec bug per one exception above. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Various HTML element feedback
On Wed, Jun 6, 2012 at 2:53 AM, Ian Hickson i...@hixie.ch wrote: That might be realistic, especially there is no significant semantic clarification in sight in general. This raises the question why we could not just return to the original design with some physical markup like i, b, and u together with span that was added later. I think you'll find the original design of HTML isn't what you think it is (or at least, it's certainly not as presentational as you imply above), but that's neither here nor there. Is there a record of design between http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html and http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt ? So why not simply define i recommended and describe var, cite, em, and dfn as deprecated but supported alternatives? What benefit does empty deprecation have? It's not like we can ever remove these elements altogether. What harm do they cause? The harm is the wasted time spent worrying about and debating which semantic alternative for italics to use. If we have to keep them, we are better served by embracing them and giving them renewed purpose and vigour, rather than being ashamed of them. I think we have to keep them, because trying to declare them invalid would cause people to do a lot of pointless work, too, but I think we could still be ashamed of them. Note that as it is specified, div can be used instead of p with basically no loss of semantics. (This is because the spec defines paragraph in a way that doesn't depend on p.) Is there any known example of a piece of software that needs to care about the concept of paragraph and uses the rules given in the spec for determining what constituted paragraphs? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Media queries, viewport dimensions, srcset and picture
On Wed, May 23, 2012 at 6:21 PM, Florian Rivoal flori...@opera.com wrote: On the other hand, I think that including 600w 400h in there is misguided. I agree. 1) simplyfy srcset to only accept the *x qualifier Is there a good reason to believe that * will be something other than a power of two? That is, could we just optimize the *x syntax away and specify that the first option is 1x, the second is 2x, the third is 4x, etc.? I believe the only way out is through an image format that: ... - is designed so that the browser can stop downloading half way through the file, if it determines it got sufficiently high resolution given the environment More to the point, the important characteristic is being able to stop downloading *quarter* way through the file and get results that are as good as if the full-size file had been down sampled with both dimensions halved and that size had been sent as the full file. (I am not aware of a bitmap format suitable for photographs that has this characteristic. I am aware that JPEG 2000 does not have this characteristic. I believe interlaced PNGs have that characteristic, but they aren't suitable for photographs, due to the lossless compression.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Bandwidth media queries
On Wed, May 16, 2012 at 9:48 PM, Matthew Wilcox m...@matthewwilcox.com wrote: If you're a browser you are the software interpreting the instructions of a given language: CSS in this case. In addition to the problem that it's actually hard for browsers to know what the current bandwidth is especially on mobile networks, some of this responsive design threads assume that the author knows best when to withdraw content or content quality due to low bandwidth. From the user perspective, I think it's not at all clear that users always prefer to get less content when they are on a slower connection. Personally, I expect to see full content on a slow connection if I wait for long enough, but it's also annoying to have to wait for the whole page to load before the page is usable. The problem is that sometimes waiting is worth it and sometimes it isn't and the author might not know when the user considers the wait to be worth it. Unfortunately, the way the load event works makes it hard to make pages so that they start working before images are fully loaded and then keep improving in image quality if the user chooses to wait. Also, some browsers intentionally limit their own ability to do incremental rendering both to get better throughput and to get better perceptual speed in cases where the overall page load is relatively fast. On a very slow networks (GPRS or airline Wi-Fi) I think Opera Mini with *full* image quality provides the best experience: the page renders with its final layout and becomes interactive with images replaced with large areas of color that represents the average color occupying that area in the images. The images then become sharper over time. Thus, the user has the option to start interacting with the page right away if the user deems the image is not worth the wait or can choose to wait if the user expects the images to contain something important. (This assumes, of course, that the user is not paying per byte even though the connection is slow, so that it's harmless from the user perspective to start loading data that the user might dismiss by navigating away from the page without waiting for the images to load in full.) Instead of giving web authors the tools to micro-manage what images get shown in what quality under various bandwidth conditions, I think it would be better to enable a load mode in traditional-architecture (that is, not the Opera Mini thin client architecture) browsers that would allow early layout and load event and progressive image quality enhancement after the load event is fired and the page has its final layout (in the sense of box dimensions). I.e. have a mode where the load event fires as soon as the everything except images have loaded and the dimensions of all image boxes are known to the CSS formatter (and PNG and JPEG progression is used 1990s style after the load even has fired). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Features for responsive Web design
On Wed, May 16, 2012 at 2:46 PM, Jeremy Keith jer...@adactio.com wrote: You're right. I was thinking that the values (Nh Nw Nx) described the *image* but in fact they describe (in the case of Nh and Nw) the viewport and (in the case of Nx) the pixel density of the screen/device. I suspect I won't be the only one to make that mistake. Indeed. I made the same mistake initially. The what's currently in the spec is terribly counter-intuitive in this regard. I can see now how it does handle the art-direction case as well. I think it's a shame that it's a different syntax to media queries but on the plus side, if it maps directly to imgset in CSS, that's good. It seems to me that Media Queries are appropriate for the art-direction case and factors of the pixel dimensions of the image referred to by src= are appropriate for the pixel density case. I'm not convinced that it's a good idea to solve these two axes in the same syntax or solution. It seems to me that srcset= is bad for the art-direction case and picture is bad for the pixel density case. (I think the concept of dpi isn't appropriate for either case, FWIW. I think the number of horizontal and vertical bitmap samples doubled relative to the traditional src image works much better conceptually for Web authoring than making people do dpi math with an abstract baseline of 96 dpi. Anecdotal observation of trying to get family members to do dpi math for print publications suggests that it's hard to get educated people do dpi math right even when an inch is a real inch an not an abstraction.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] IBM864 mapping of Encoding Standard
On Tue, Apr 24, 2012 at 6:31 AM, Makoto Kato m_k...@ga2.so-net.ne.jp wrote: (2012/04/20 17:09), Anne van Kesteren wrote: Does that mean you want to remove the encoding from Gecko? That would work for me. It is currently not supported by Opera either. Alternatively mapping 0xA7 to U+20AC works for me too, but I don't want it to tinker with the ASCII range. Except to OS/2 and AIX, I think that this encoding is unnecessary since most browsers aren't supported. Does the OS/2 port need it for interfacing with the system APIs? If the OS/2 port needs it for interfacing with the system APIs, can we stop exposing the encoding to the Web and can we stop building the IBM864 encoder/decoder on non-OS/2 platforms? I think it's a bad idea to vary the supported set of Web-exposed encodings by operating system. IIRC, some old Mac encodings that are still relevant for dealing with legacy fonts were hidden from Web content and UTF-7 was made mail-only. If OS/2 doesn't need it for system APIs, can we just remove the IBM864 support altogether. Is the AIX port still relevant? I thought 3.6 was the last version ported to AIX. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Encoding Sniffing
On Sat, Apr 21, 2012 at 1:21 PM, Anne van Kesteren ann...@opera.com wrote: This morning I looked into what it would take to define Encoding Sniffing. http://wiki.whatwg.org/wiki/Encoding#Sniffing has links as to what I looked at (minus Opera internal). As far as I can tell Gecko has the most comprehensive approach and should not be too hard to define (though writing it all out correctly and clear will be some work). The Gecko notes aren't quite right: * The detector chosen from the UI is used for HTML and plain text when loading those in a browsing context from HTTP GET or from a non-http URL. (Not used for POST responses. Not used for XHR.) * The default for the UI setting depends on the locale. Most locales default to know detector at all. Only zh-TW defaults to the Universal detector. (I'm not sure why, but I think this is a bug of *some* kind. Perhaps the localizer wanted to detect both Traditional and Simplified Chinese encodings and we don't have a detector configuration for TraditionalSimplified.) Other locales that default to having a detector enabled default to a locale-specific detector (e.g. Japanese or Ukranian). * The Universal detector is used regardless of UI setting or locale when using the FileReader to read a local file as text. (I'm personally very unhappy about this sort of use of heuristics in a new feature.) * The Universal detector isn't really universal. In particular, it misdetects Central European encodings like ISO-8859-2. (I'm personally unhappy that we expose the Universal detector in the UI and thereby bait people to enable it.) * Regardless of detector setting, when loading HTML or plain text in a browsing context, Basic Latin encoded as UTF-16BE or UTF-16LE is detected. This detection is not performed by FileReader. I have some questions though: 1) Is this something we want to define and eventually implement the same way? I think yes in principle. In practice, it might be hard to get this done. E.g. in the case of Gecko, we'd need someone who has no higher priority work than rewriting chardet in compliance with the hypothetical spec. I don't want to enable heuristic detection for all HTML page loads. Yet, it seems that we can't get rid of it for e.g. the Japanese context. (It's so sad that the situation is the worst in places that have multiple encodings and, therefore, logically should be more aware of the need to declare which one is in use. Sigh.) I think it is bad that the Web-exposed behavior of the browser depends on the UI locale of the browser. I think it would be worthwhile research project to find out if that were feasible to trigger language-specific heuristic detection on a per TLD basis instead on a per UI locale basis (e.g. enabling the Japanese detector for all pages loaded from .jp and the Russian detector for all pages loaded from .ru regardless of UI locale and requiring .com Japanese or Russian sites to get their charset act together or maybe having a short list of popular special cases that don't use a country TLD but don't declare the encoding, either). 2) Does this need to apply outside HTML? For JavaScript it forbidden per the HTML standard at the moment. CSS and XML do not allow it either. Is it used for decoding text/plain at the moment? Detection is used for text/plain in Gecko when it would be used for text/html. I think detection shouldn't be used for anything except plain text and HTML being loaded into browsing context considering that we've managed this far without it (well, except for FileReader). (Note that when not declaring the encoding on their own JavaScript and CSS inherit the encoding of the HTML document that references them.) 3) Is there a limit to how many bytes we should look at? In Gecko, the Basic Latin encoded as UTF-16BE or UTF-16LE check is run on the first 1024 bytes. For the other heuristic detections, there is no limit and changing the encoding potentially causes renavigation to the page. During the Firefox for development cycle, there was a limit of 1024 bytes (no renavigation!), but it was removed in order to support the Japanese Planet Debian (site fixed since then) and other unspecified but rumored Japanese sites. On Sun, Apr 22, 2012 at 2:11 AM, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: We've had some discussion on the usefulness of this in WebVTT - mostly just in relation with HTML, though I am sure that stand-alone video players that decode WebVTT would find it useful, too. WebVTT is a new format with no legacy. Instead of letting it become infected with heuristic detection, we should go the other direction and hardwire it as UTF-8 like we did with app cache manifests and JSON-in-XHR. No one should be creating new content in encodings other than UTF-8. Those who can't be bothered to use The Encoding deserve REPLACEMENT CHARACTERs. Heuristic detection is for unlabeled legacy content. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Readiness of script-created documents
On Mon, Jun 20, 2011 at 3:10 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, Jun 20, 2011 at 4:26 AM, Henri Sivonen hsivo...@iki.fi wrote: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1039 It says complete in Firefox, loading in Chrome and Opera and uninitialized in IE. The spec requires complete. readyState is originally an IE API. Why doesn't the spec require uninitialized? (The implementation in Gecko is so recent that it's quite possible that Gecko followed the spec and the spec just made stuff up as opposed to the spec following Gecko.) complete seems like the most useful and consistent value which would seem like a good reason to require that. Why don't aborted documents reach complete in Gecko? It seems weird to have aborted documents stay in the loading state when they are not, in fact, loading. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] Proposal for readyState behavior
IE can omit interactive: http://hsivonen.iki.fi/test/moz/readystate/document-open.html load can be synchronous in Chrome and IE: http://hsivonen.iki.fi/test/moz/readystate/document-open.html Firefox forgets DOMContentLoaded for XSLT: http://hsivonen.iki.fi/test/moz/readystate/xslt.html Firefox skips interactive but not DOMContentLoaded when aborting: http://hsivonen.iki.fi/test/moz/readystate/window-stop.html Documents aborted by window.location reach complete in Opera: http://hsivonen.iki.fi/test/moz/readystate/window-location.html Defer scripts are executed at the wrong time in Firefox: http://hsivonen.iki.fi/test/moz/readystate/defer-script.html -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] DOMContentLoaded, load and current document readiness
On Tue, Jan 10, 2012 at 2:10 AM, Ian Hickson i...@hixie.ch wrote: On Tue, 31 May 2011, Henri Sivonen wrote: Recently, there was discussion about changing media element state in the same task that fires the event about the state change so that scripts that probe the state can make non-racy conclusions about whether a certain event has fired already. Currently, there seems to be no correct non-racy way to write code that probes a document to determine if DOMContentLoaded or load has fired and runs code immediately if the event of interest has fired or adds a listener to wait for the event if the event hasn't fired. Are there compat or other reasons why we couldn't or shouldn't make it so that the same task that fires DOMContentLoaded changes the readyState to interactive and the same task that fires load changes readyState to complete? Fixed for 'load'. I don't see a good way to fix this for 'DOMContentLoaded', unfortunately. It turns out that Firefox has accidentally been running defer scripts after DOMContentLoaded. I haven't seen bug reports about this. Embracing this bug might offer a way to always keep the readystatechange to interactive in the same task that fire DOMContentLoaded. See http://hsivonen.iki.fi/test/moz/readystate/defer-script.html -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] readyState transitions when aborting a document
On Thu, Apr 19, 2012 at 2:43 PM, Henri Sivonen hsivo...@iki.fi wrote: * Is there a way to abort a document load in IE without causing immediate navigation away from the document? IE doesn't support window.stop(). Yes. document.execCommand(Stop) * Does Web compatibility ever require a transition from loading to complete without an intermediate interactive state? (Both chrome and Firefox as shipped make such transitions, but those might be bugs.) I have no evidence to say anything sure here, but I doubt Web compat requires transitions from loading to complete. What actually happens varies a lot. * Should the aborted documents stay in the loading state forever like the spec says or should they reach the complete state eventually when the event loop spins? Gecko and WebKit disagree. * Should window.stop() really not abort the parser like the spec seems to suggest? Looks like Opera is alone with the non-aborting behavior. The spec is wrong. * Should reaching complete always involve firing load? Not in WebKit. * Should reaching interactive always involve firing DOMContentLoaded? Probably. * Does anyone have test cases for this stuff? Demos: http://hsivonen.iki.fi/test/moz/readystate/ -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] readyState transitions when aborting a document
I've been trying to make document.readyState transitions less broken in Gecko. (The transitions are very sad as of Firefox 13 in pretty much all but the most trivial cases.) I'm having a particularly hard time figuring out what the right thing to do is when it comes to aborting document loads. Unfortunately, I don't trust the spec to describe the Web-compatible truth. * Is there a way to abort a document load in IE without causing immediate navigation away from the document? IE doesn't support window.stop(). * Does Web compatibility ever require a transition from loading to complete without an intermediate interactive state? (Both chrome and Firefox as shipped make such transitions, but those might be bugs.) * Should the aborted documents stay in the loading state forever like the spec says or should they reach the complete state eventually when the event loop spins? * Should window.stop() really not abort the parser like the spec seems to suggest? * Should reaching complete always involve firing load? * Should reaching interactive always involve firing DOMContentLoaded? * Does anyone have test cases for this stuff? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Tue, Apr 3, 2012 at 10:08 PM, Anne van Kesteren ann...@opera.com wrote: I didn't mean a prescan. I meant proceeding with the real parse and switching decoders in midstream. This would have the complication of also having to change the encoding the document object reports to JavaScript in some cases. On IRC (#whatwg) zcorpan pointed out this would break URLs where entities are used to encode non-ASCII code points in the query component. Good point. So it's not worthwhile to add magic here. It's better that authors declare that they are using UTF-8. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Readiness of script-created documents
On Mon, Apr 2, 2012 at 11:29 AM, Jonas Sicking jo...@sicking.cc wrote: Everyone returning the same thing isn't the only goal. First of all what's the purpose of all browsers doing the same thing if that same thing isn't useful? No one is worse off and stuff works even if an author somewhere relies on a crazy edge case behavior. Second, you are assuming that people are actually aware of this edge case and account for it. Here it seems just as likely to me that generic code paths would result in buggy pages given IEs behavior, and correct behavior given the specs behavior. Third, if no-one is hitting this edge case, which also seems quite plausible here, then it having a while longer without interoperability won't really matter what we do and doing the most useful thing seems like the best long-term goal. On the other hand, for cases no one is hitting, it's probably not worthwhile to spend time trying to get the behavior to change from what was initially introduced. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Wed, Jan 4, 2012 at 12:34 AM, Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: I mean the performance impact of reloading the page or, alternatively, the loss of incremental rendering.) A solution that would border on reasonable would be decoding as US-ASCII up to the first non-ASCII byte Thus possibly prescan of more than 1024 bytes? I didn't mean a prescan. I meant proceeding with the real parse and switching decoders in midstream. This would have the complication of also having to change the encoding the document object reports to JavaScript in some cases. and then deciding between UTF-8 and the locale-specific legacy encoding by examining the first non-ASCII byte and up to 3 bytes after it to see if they form a valid UTF-8 byte sequence. Except for the specifics, that sounds like more or less the idea I tried to state. May be it could be made into a bug in Mozilla? It's not clear that this is actually worth implementing or spending time on its this stage. However, there is one thing that should be added: The parser should default to UTF-8 even if it does not detect any UTF-8-ish non-ASCII. That would break form submissions. But trying to gain more statistical confidence about UTF-8ness than that would be bad for performance (either due to stalling stream processing or due to reloading). So here you say tthat it is better to start to present early, and eventually reload [I think] if during the presentation the encoding choice shows itself to be wrong, than it would be to investigate too much and be absolutely certain before starting to present the page. I didn't intend to suggest reloading. Adding autodetection wouldn't actually force authors to use UTF-8, so the problem Faruk stated at the start of the thread (authors not using UTF-8 throughout systems that process user input) wouldn't be solved. If we take that logic to its end, then it would not make sense for the validator to display an error when a page contains a form without being UTF-8 encoded, either. Because, after all, the backend/whatever could be non-UTF-8 based. The only way to solve that problem on those systems, would be to send form content as character entities. (However, then too the form based page should still be UTF-8 in the first place, in order to be able to take any content.) Presumably, when an author reacts to an error message, (s)he not only fixes the page but also the back end. When a browser makes encoding guesses, it obviously cannot fix the back end. [ Original letter continued: ] Apart from UTF-16, Chrome seems quite aggressive w.r.t. encoding detection. So it might still be an competitive advantage. It would be interesting to know what exactly Chrome does. Maybe someone who knows the code could enlighten us? +1 (But their approach looks similar to the 'border on sane' approach you presented. Except that they seek to detect also non-UTF-8.) I'm slightly disappointed but not surprised that this thread hasn't gained a message explaining what Chrome does exactly. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Readiness of script-created documents
On Fri, Mar 30, 2012 at 8:26 PM, Jonas Sicking jo...@sicking.cc wrote: On Friday, March 30, 2012, Henri Sivonen wrote: On Fri, Jan 13, 2012 at 2:26 AM, Ian Hickson i...@hixie.ch wrote: Jonas is correct. Since there was no interop here I figured we might as well go with what made sense. I'm somewhat unhappy about fixing IE-introduced APIs to make sense like this. The implementation in Gecko isn't particularly good. When trying to make it better, I discovered that doing what IE did would have lead to simpler code. That's not a particularly strong argument. The question is what's better for authors. Gratuitously changing features introduced by IE does not help authors one day have to support the old IE behavior for years. Either authors don't use the API in the uninteroperable situation or they will have to deal with different browsers returning different things. The easiest path to get to the point where all browsers in use return the same thing would have been for others to do what IE did. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Readiness of script-created documents
On Mon, Apr 2, 2012 at 10:12 AM, Henri Sivonen hsivo...@iki.fi wrote: Gratuitously changing features introduced by IE does not help authors one day have to ...when they have to... -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Character-encoding-related threads
On Thu, Dec 1, 2011 at 1:28 AM, Faruk Ates faruka...@me.com wrote: We like to think that “every web developer is surely building things in UTF-8 nowadays” but this is far from true. I still frequently break websites and webapps simply by entering my name (Faruk Ateş). Firefox 12 whines to the error console when submitting a form using an encoding that cannot represent all Unicode. Hopefully, after Firefox 12 has been released, this will help Web authors to actually test their sites with the error console open locate forms that can corrupt user input. On Wed, 7 Dec 2011, Henri Sivonen wrote: I believe I was implementing exactly what the spec said at the time I implemented that behavior of Validator.nu. I'm particularly convinced that I was following the spec, because I think it's not the optimal behavior. I think pages that don't declare their encoding should always be non-conforming even if they only contain ASCII bytes, because that way templates created by English-oriented (or lorem ipsum -oriented) authors would be caught as non-conforming before non-ASCII text gets filled into them later. Hixie disagreed. I think it puts an undue burden on authors who are just writing small files with only ASCII. 7-bit clean ASCII is still the second-most used encoding on the Web (after UTF-8), so I don't think it's a small thing. http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html I still think that allowing ASCII-only pages to omit the encoding declaration is the wrong call. I agree with Simon's point about the doctype and reliance on quirks. Firefox Nightly (14 if all goes well) whines to the error console when the encoding hasn't been declared and about a bunch of other encoding declaration-related bad conditions. It also warns about ASCII-only pages, because I didn't want to burn cycles detecting whether a page is ASCII-only and because I think it's the wrong call not to whine about ASCII-only templates that might getting non-ASCII content later. However, I suppressed the message about the lack of an encoding declaration for different-origin frames, because it is so common for ad iframes that contain only images or flash objects to lack an encoding declaration that not suppressing the message would have made the error console too noisy. It's cheaper to detect whether the message is about to be emitted for a different-origin frame than to detect whether it's about to be emitted for an ASCII-only page. Besides, authors generally are powerless to fix the technical flaws of different-origin embeds. On Mon, 19 Dec 2011, Henri Sivonen wrote: Hmm. The HTML spec isn't too clear about when alias resolution happens, to I (incorrectly, I now think) mapped only UTF-16, UTF-16BE and UTF-16LE (ASCII-case-insensitive) to UTF-8 in meta without considering aliases at that point. Hixie, was alias resolution supposed to happen first? In Firefox, alias resolution happen after, so meta charset=iso-10646-ucs-2 is ignored per the non-ASCII superset rule. Assuming you mean for cases where the spec says things like If encoding is a UTF-16 encoding, then change the value of encoding to UTF-8, then any alias of UTF-16, UTF-16LE, and UTF-16BE (there aren't any registered currently, but Unicode might need to be one) would be considered a match. ... Currently, iso-10646-ucs-2 is neither an alias for UTF-16 nor an encoding that is overridden in any way. It's its own encoding. That's not reality in Gecko. I hope the above is clear. Let me know if you think the spec is vague on the matter. Evidently, it's too vague, because I read the spec and implemented something different from what you meant. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] window.location aborting the parser and subsequent document.writes
On Tue, Feb 14, 2012 at 2:43 AM, Ian Hickson i...@hixie.ch wrote: On Thu, 5 Jan 2012, Henri Sivonen wrote: Consider https://bug98654.bugzilla.mozilla.org/attachment.cgi?id=77369 with the popup blocker disabled. Chrome, Opera and IE open a new window/tab and load the Mozilla front page into it. Firefox used to but doesn't anymore. As far as I can tell, Firefox behaves according to the spec: Setting window.location aborts the parser synchronously and the first subsequent document.write() then implies a call to document.open(), which aborts the navigation started by window.location. Per spec, aborting the parser doesn't cause document.write() to imply a call to document.open(). Specifically, it leaves the insertion point in a state that is defined, but with the parser no longer active, and discarding any future data added to it. That an aborted parser keep having a defined insertion point was non-obvious. Thanks. Fixed in Gecko on trunk. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Client side value for language preference
On Thu, Mar 29, 2012 at 10:02 PM, Matthew Nuzum n...@bearfruit.org wrote: Some browsers have gotten smarter and now send the first value from the user's language preference, which is definitely an improvement. I suspect this was done in order to preserve backwards compatibility, so much of the useful information is left out. ... navigator.language.preference = [{lang:'en-gb', weight: 0.7},{lang: 'en-us', weight: 0.7},{lang:'en', weight: 0.3}]; Is there a reason to believe that this client-side solution would be used significantly considering that the HTTP header has not been used that much? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header (Boris Zbarsky)
On Mon, Feb 6, 2012 at 9:24 PM, Irakli Nadareishvili ira...@gmail.com wrote: if you don't mind me saying it, I am afraid you may be missing the point of this request. In Responsive Web Design, device capabilities are used in a high-level fashion to determine a class of the device: smartphone, tablet, desktop. Firefox (at least from version 12 up), Opera Mobile and Safari already expose this information. Firefox for tablets includes the substring Tablet in the UA string and Firefox for phones includes the substring Mobile in the UA string. If neither Tablet nor Mobile is present in the UA string, the browser is running on a desktop. In the case of Opera (excluding Mini), the indicators are Tablet and Mobi (and desktop otherwise). In the case of Safari, if the substring iPad is present, it's a tablet. Otherwise, if the substring Mobile is present, it's a phone form factor. Otherwise, its desktop (or a non-Safari browser spoofing as Safari). IE differentiates between desktop on the phone form factor as well: the mobile form factor in closest substring IEMobile. Unfortunately, the Android stock browser on Android tablets does not include a clear tablet indicator. So you get something like /** * Returns desktop, tablet or phone * Some 7 tablets get reported as phones. Android netbook likely get reported as tablets. * Touch input not guaranteed on phones (Opera Mobile on keypad Symbian, for example) and tablets (non-Fire Kindle)! */ function formFactor() { var ua = navigator.userAgent; if (ua.indexOf(Tablet) -1) { // Opera Mobile on tablets, Firefox on tablets, Playbook stock browser return tablet; } if (ua.indexOf(iPad) -1) { // Safari on tablets return tablet; } if (ua.indexOf(Mobi) -1) { // Opera Mobile on phones, Firefox on phones, Safari on phones (and same-sized iPod Touch), IE on phones, Android stock on phones, Chrome on phones, N9 stock, Dolfin on Bada return phone; } if (ua.indexOf(Opera Mini) -1) { // Opera Mini (could be on a tablet, though); let's hope Opera puts Tablet in the Mini UA on tablets return phone; } if (ua.indexOf(Symbian) -1) { // S60 browser (predates Mobile Safari and does not say Mobile) return phone; } if (ua.indexOf(Android) -1 ua.indexOf(Safari) -1) { // Android stock on tablet or Chrome on Android tablet return tablet; } if (ua.indexOf(Kindle) -1) { // Various Kindles; not all touch! return tablet; } if (ua.indexOf(Silk-Accelerated) -1) { // Kindle Fire in Silk mode return tablet; } return desktop; } Seems like the coarse form factor data is pretty much already in the UA strings. Things could be improved by Opera Mini, Safari, Amazon's browsers and Google's browsers saying Tablet when on tablet. Symbian is dead, so no hope for its stock browser starting to say Mobi. The inferences you may want to make from the form factor data may well be wrong. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Client side value for language preference
On Fri, Mar 30, 2012 at 5:08 PM, Matthew Nuzum n...@bearfruit.org wrote: For example, maybe a site can't afford translation but a small library could be included that formats dates and numbers based on a user's language preference. No more wondering if 2/3/12 is in March or in February. The reader doesn't know that the site tries to be smart about dates (but not smart enough to just use ISO dates), so scrambling the order of date components not to match the convention of the language of the page is probably worse than using the convention that's congruent with the language of the page. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Readiness of script-created documents
On Fri, Jan 13, 2012 at 2:26 AM, Ian Hickson i...@hixie.ch wrote: Jonas is correct. Since there was no interop here I figured we might as well go with what made sense. I'm somewhat unhappy about fixing IE-introduced APIs to make sense like this. The implementation in Gecko isn't particularly good. When trying to make it better, I discovered that doing what IE did would have lead to simpler code. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Wed, Mar 14, 2012 at 12:49 AM, Jonas Sicking jo...@sicking.cc wrote: Something that has come up a couple of times with content authors lately has been the desire to convert an ArrayBuffer (or part thereof) into a decoded string. Similarly being able to encode a string into an ArrayBuffer (or part thereof). Something as simple as DOMString decode(ArrayBufferView source, DOMString encoding); ArrayBufferView encode(DOMString source, DOMString encoding, [optional] ArrayBufferView destination); It saddens me that this allows non-UTF-8 encodings. However, since use cases for non-UTF-8 encodings were mentioned in this thread, I suggest that the set of supported encodings be an enumerated set of encodings stated in a spec and browsers MUST NOT support other encodings. The set should probably be the set offered in the encoding popup at http://validator.nu/?charset or a subset thereof (containing at least UTF-8 of course). (That set was derived by researching the intersection of the encodings supported by browsers, Python and the JDK.) would go a very long way. Are you sure that it's not necessary to support streaming conversion? The suggested API design assumes you always have the entire data sequence in a single DOMString or ArrayBufferView. The question is where to stick these functions. Internationalization doesn't have a obvious object we can hang functions off of (unlike, for example crypto), and the above names are much too generic to turn into global functions. If we deem streaming conversion unnecessary, I'd put the methods on DOMString and ArrayBufferView. It would be terribly sad to let the schedules of various working groups affect the API design. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header
On Tue, Feb 7, 2012 at 4:13 PM, Matthew Wilcox m...@matthewwilcox.com wrote: Ahhh, ok. I was not aware that SPDY is intended to suffer from the flaws inflicted by the dated mechanics of HTTP. Is it really different semantics though? I don't see how it's harmful to enable resource adaption over SPDY just because browser vendors have decided that HTTP is too expensive to do it? ... I'm sensing the SPDY/HTTP identical-semantics standpoint may be a philosophical thing rather than technical? Is it a philosophical or technical thing to suggest that it would be a bad idea for a server to send different style rules depending on whether the HTTP client requests /style.css with Accept-Encoding: gzip or not? SPDY is an autonegotiated by design invisible to the next layer upgrade to how HTTP requests and reponses are compressed and mapped to TCP streams. Of course it would be *possible* to tie other side effects to this negotiation, but it doesn't mean it's sound design or a good idea. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header
On Tue, Feb 7, 2012 at 11:17 PM, divya manian divya.man...@gmail.com wrote: This is the info I would love to see any time for my app to make the kind of decision it should: * connection speed: so I know how fast my resources can load, how quickly. * bandwidth caps: so I know I shouldn't be sending HD images. How do you know that I don't want to use my bandwidth quota to see your site fully if I chose to navigate to it? * battery time: network requests are a drain on battery life, if I know before hand, I can make sure the user gets information in time. Why should you drain my batter faster if the battery is more full? (For stuff like throttling down animation and XHR polling, the UA should probably prevent background tabs from draining battery even when the battery is near full regardless of whether the site/app is benevolently cooperative.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header
On Mon, Feb 6, 2012 at 5:52 PM, Matthew Wilcox m...@matthewwilcox.com wrote: Also, as indicated, with SPDY this is much much less of a problem than for HTTP. SPDY transfers the HTTP semantics more efficiently when supported. You aren't supposed to communicate different semantics depending on whether SPDY is enabled. That would be a layering violation. That is, SPDY is supposed to work as a drop-in replacement for the old way of putting HTTP semantics over IP. You aren't supposed to send different headers depending on whether SPDY is there or not. And the old HTTP is going to be around for a *long* time, so even if a bunch of important sites start supporting SPDY, if browsers send the same headers in all cases to avoid the layering violation, the long tail of plain old HTTP sites would be harmed by request size bloat. So I think SPDY will fix it is not a persuasive argument for allowing HTTP request bloat to cater to the bandwagon of the day. (Sorry if that seems offensive. You've worked on responsive images, so they evidently seem important to you, but in the long-term big picture, it's nowhere near proven that they aren't a fad of interest to a relative small number of Web developers.) If there is evidence that responsive images aren't just a fad bandwagon and there's a long-term need to support them in the platform, I think supporting something like picture source src=something.jpg media=... source src=other.jpg media=... img src=fallback.jpg /picture would make more sense, since the added to transfer this markup would affect sites that use this stuff instead of affecting each request to all sites that don't use this stuff. This would be more intermediary-friendly, too, by not involving the Vary header. The points Boris made about the device pixel size of the image changing after the page load still apply, though. But still, engineering for sites varying the number of pixels they send for photos seems a bit premature when sites haven't yet adopted SVG for illustrations, diagrams, logos, icons, etc. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] add html-attribute for responsive images
On Tue, Feb 7, 2012 at 1:15 AM, Bjartur Thorlacius svartma...@gmail.com wrote: Why not use a media attribute of object? There's probably already a better answer to Why not use object for foo? in the archives of this list, but the short version is that it's nicer for implementations to have elements that support particular functionality when node is created instead of having elements that change their nature substantially depending on attributes, network fetches, presence of plug-ins, etc., etc. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] should we add beforeload/afterload events to the web platform?
On Tue, Jan 17, 2012 at 6:29 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 1/17/12 7:49 AM, Henri Sivonen wrote: On Sun, Jan 15, 2012 at 11:23 PM, Boris Zbarskybzbar...@mit.edu wrote: Preventing _all_ loads for a document based on some declarative thing near the start of the document, on the other hand, should not be too bad. A page-wide disable optimizations flag could easily be cargo-culted into something harmful. Consider if the narrative becomes that setting such a flag is good for mobile or something. Who said anything about disable optimizations? I suggested a flag to prevent all subresource loads, not just speculative preloads. Basically a treat this as a data document flag. Oh I see. Sorry. If that plus a beforeprocess event addresses the majority of the web-facing use cases, we should consider adding that. So what are the Web-facing use cases? As in: What are people trying to accomplish with client-side transformations? Well, what mobify is apparently trying to accomplish is take an existing (not-mobile-optimized, or in other words typically ad-and-float-and-table-laden) page and modify it to look reasonable on a small screen. That includes not loading some of the stylesheets and various changes to the DOM, as far as I can tell. FWIW, I'm completely unsympathetic to this use case and I think we shouldn't put engineering effort into supporting this scenario. As far as the user is concerned, it would be much better for the site to get its act together on the server side and not send an ad-laden table page to anyone. It sucks to burn resources on the client side to fix things up using scripts provided by the same server that sends the broken stuff in the first place. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Augmenting HTML parser to recognize new elements
On Wed, Jan 18, 2012 at 8:19 PM, Dimitri Glazkov dglaz...@chromium.org wrote: A typical example would be specifying an insertion point (that's content element) as child of a table: table content tr ... /tr /content /table Both shadow and template elements have similar use cases. This doesn't comply with the Degrade Gracefully design principle. Is this feature so important that it's reasonable to change table parsing (one of the annoying parts of the parsing algorithm) in a way that'd make the modified algorithm yield significantly different results than existing browsers? Have designs that don't require changes to table parsing been explored? What would be the sane way to document such changes to the HTML parser behavior? A change to the HTML spec proper *if* we decide that changes are a good idea. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] title/meta elements outside of head
On Thu, Jan 19, 2012 at 8:30 AM, Michael Day mike...@yeslogic.com wrote: What is the reason why title/meta elements are not always moved to the head, regardless of where they appear? They didn't need to be for compatibility, so we went with less magic. Also, being able to use meta and link as descendants of body is useful for Microdata and RDFa Lite without having to mint new void elements. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] should we add beforeload/afterload events to the web platform?
On Sun, Jan 15, 2012 at 11:23 PM, Boris Zbarsky bzbar...@mit.edu wrote: Preventing _all_ loads for a document based on some declarative thing near the start of the document, on the other hand, should not be too bad. A page-wide disable optimizations flag could easily be cargo-culted into something harmful. Consider if the narrative becomes that setting such a flag is good for mobile or something. A per-element disable optimizations attribute would be slightly less dangerous, since authors couldn't just set it once and forget it. If that plus a beforeprocess event addresses the majority of the web-facing use cases, we should consider adding that. So what are the Web-facing use cases? As in: What are people trying to accomplish with client-side transformations? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] should we add beforeload/afterload events to the web platform?
On Tue, Jan 10, 2012 at 7:48 AM, Tantek Çelik tan...@cs.stanford.edu wrote: 1. Performance. Reducing bandwidth use / HTTP requests, e.g. AdBlock extension[2] Extension use cases don't require an API exposed to Web content, though. Furthermore, IE9 has a built content blocking rule engine and Firefox has a de facto dominant rule engine for year even though it has been shipped separately (AdBlock Plus). Maybe instead of exposing arbitrary programmability for content blocking, other browsers should follow IE9 and offer a built-in rule engine for content blocking instead of letting extensions run arbitrary JS to inspect every load. 2. Clientside transformations, e.g. Mobify[3] There's already an easier cross-browser way to deactivate an HTML page and use its source as input to a program: document.write(plaintext style='display:none;'); (This gives you source to work with instead of a DOM, but you can explicitly parse the source to a DOM.) Anyway, I'd rather see mobile adaptations be based on CSS instead of everyone shipping a bunch of JS to the client munge the page in ways that foil all optimizations that browsers do for regular page loads. As might be expected, there is at least one use-case for a complementary 'afterload' event: 1. Downloadable fonts - people who want to use custom fonts for drawing in the canvas element need to know when a font has loaded. 'afterload' seems like a good way to know that, since it happens as a side effect of actually using it and fonts don't have an explicit load API like images do.[4] It seems like fonts should have an API for listening when they become available, yes. Should 'beforeload'/'afterload' be explicitly specified and added to the web platform? I'm worried about the interaction with speculative loading. Right now, Gecko is more aggressive than WebKit about speculative loading. I don't want to make Gecko less aggressive about speculative loading in order to fire beforeload exactly at the points where WebKit fires them. I'm even worried about exposing resource load decisions to the main thread at all. Right now in Gecko, the HTML parser sees the data on a non-main thread. Networking runs on another non-main thread. Even though right now speculative loads travel from the parser thread to networking library via the main thread, it would be unfortunate to constrain the design so that future versions of Gecko couldn't communicate speculative loads directly from the parser thread to the networking thread without waiting on the main-thread event loop in between. (In this kind of design, a built-in content blocking rule engine would be nicer than letting extensions be involved in non-main threads.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] window.location aborting the parser and subsequent document.writes
Consider https://bug98654.bugzilla.mozilla.org/attachment.cgi?id=77369 with the popup blocker disabled. Chrome, Opera and IE open a new window/tab and load the Mozilla front page into it. Firefox used to but doesn't anymore. As far as I can tell, Firefox behaves according to the spec: Setting window.location aborts the parser synchronously and the first subsequent document.write() then implies a call to document.open(), which aborts the navigation started by window.location. Is there a mechanism in the spec that makes this work as in Chrome, Opera and IE and I'm failing to read the spec right? If not, what's the mechanism that causes Chrome and IE to load the Mozilla front page into the newly-opened window/tab in this case? Note that in this modified case http://hsivonen.iki.fi/test/moz/write-after-location.html (requires a click so that there's no need to adjust the popup blocker) the console says before and after but not later in Chrome and IE. Opera says before and after and then the opener script ends with a security error, because write is already a different-origin call, i.e. setting window.location has immediately made the document in the new window different-origin. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Thu, Dec 22, 2011 at 12:36 PM, Leif Halvard Silli l...@russisk.no wrote: It's unclear to me if you are talking about HTTP-level charset=UNICODE or charset=UNICODE in a meta. Is content labeled with charset=UNICODE BOMless? Charset=UNICODE in meta, as generated by MS tools (Office or IE, eg.) seems to usually be BOM-full. But there are still enough occurrences of pages without BOM. I have found UTF-8 pages with the charset=unicode label in meta. But the few page I found contained either BOM or HTTP-level charset=utf-8. I have to little research material when it comes to UTF-8 pages with charset=unicode inside. Making 'unicode' an alias of UTF-16 or UTF-16LE would be useless for pages that have a BOM, because the BOM is already inspected before meta and if HTTP-level charset is unrecognized, the BOM wins. Making 'unicode' an alias of UTF-16 or UTF-16LE would be useful for UTF-8-encoded pages that say charset=unicode in meta if alias resolution happens before UTF-16 labels are mapped to UTF-8. Making 'unicode' an alias for UTF-16 or UTF-16LE would be useless for pages that are (BOMless) UTF-16LE and that have charset=unicode in meta, because the meta prescan doesn't see UTF-16-encoded metas. Furthermore, it doesn't make sense to make the meta prescan look for UTF-16-encoded metas, because it would make sense to honor the value only if it matched a flavor of UTF-16 appropriate for the pattern of zero bytes in the file, so it would be more reliable and straight forward to just analyze the pattern of zero bytes without bothering to look for UTF-16-encoded metas. When the detector says UTF-8 - that is step 7 of the sniffing algorith, no? http://dev.w3.org/html5/spec/parsing.html#determining-the-character-encoding Yes. 2) Start the parse assuming UTF-8 and reload as Windows-1252 if the detector says non-UTF-8. ... I think you are mistaken there: If parsers perform UTF-8 detection, then unlabelled pages will be detected, and no reparsing will happen. Not even increase. You at least need to explain this negative spiral theory better before I buy it ... Step 7 will *not* lead to reparsing unless the default encoding is WINDOWS-1252. If the default encoding is UTF-8, then step 7, when it detects UTF-8, then it means that parsing can continue uninterrupted. That would be what I labeled as option #2 above. What we will instead see is that those using legacy encodings must be more clever in labelling their pages, or else they won't be detected. Many pages that use legacy encodings are legacy pages that aren't actively maintained. Unmaintained pages aren't going to become more clever about labeling. I am a bitt baffled here: It sounds like you say that there will be bad consequences if browsers becomes more reliable ... Becoming more reliable can be bad if the reliability comes at the cost of performance, which would be the case if the kind of heuristic detector that e.g. Firefox has was turned on for all locales. (I don't mean the performance impact of running a detector state machine. I mean the performance impact of reloading the page or, alternatively, the loss of incremental rendering.) A solution that would border on reasonable would be decoding as US-ASCII up to the first non-ASCII byte and then deciding between UTF-8 and the locale-specific legacy encoding by examining the first non-ASCII byte and up to 3 bytes after it to see if they form a valid UTF-8 byte sequence. But trying to gain more statistical confidence about UTF-8ness than that would be bad for performance (either due to stalling stream processing or due to reloading). Apart from UTF-16, Chrome seems quite aggressive w.r.t. encoding detection. So it might still be an competitive advantage. It would be interesting to know what exactly Chrome does. Maybe someone who knows the code could enlighten us? * Let's say that I *kept* ISO-8859-1 as default encoding, but instead enabled the Universal detector. The frame then works. * But if I make the frame page very short, 10 * the letter ø as content, then the Universal detector fails - on a test on my own computer, it guess the page to be Cyrillic rather than Norwegian. * What's the problem? The Universal detector is too greedy - it tries to fix more problems than I have. I only want it to guess on UTF-8. And if it doesn't detect UTF-8, then it should fall back to the locale default (including fall back to the encoding of the parent frame). Wouldn't that be an idea? No. The current configuration works for Norwegian users already. For users from different silos, the ad might break, but ad breakage is less bad than spreading heuristic detection to more locales. Here I must disagree: Less bad for whom? For users performance-wise. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Tue, Jan 3, 2012 at 10:33 AM, Henri Sivonen hsivo...@iki.fi wrote: A solution that would border on reasonable would be decoding as US-ASCII up to the first non-ASCII byte and then deciding between UTF-8 and the locale-specific legacy encoding by examining the first non-ASCII byte and up to 3 bytes after it to see if they form a valid UTF-8 byte sequence. But trying to gain more statistical confidence about UTF-8ness than that would be bad for performance (either due to stalling stream processing or due to reloading). And it's worth noting that the above paragraph states a solution to the problem that is: How to make it possible to use UTF-8 without declaring it? Adding autodetection wouldn't actually force authors to use UTF-8, so the problem Faruk stated at the start of the thread (authors not using UTF-8 throughout systems that process user input) wouldn't be solved. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] [encoding] utf-16
On Fri, Dec 30, 2011 at 12:54 PM, Anne van Kesteren ann...@opera.com wrote: And why should there be UTF-16 sniffing? The reason why Gecko detects BOMless Basic Latin-only UTF-16 regardless of the heuristic detector mode is https://bugzilla.mozilla.org/show_bug.cgi?id=631751 It's quite possible that Firefox could have gotten away with not having this behavior. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] [encoding] utf-16
On Tue, Dec 27, 2011 at 4:52 PM, Anne van Kesteren ann...@opera.com wrote: I ran some utf-16 tests using 007A as input data, optionally preceded by FFFE or FEFF, and with utf-16, utf-16le, and utf-16be declared in the Content-Type header I suggest testing with zero, one, two and three BOMs. I'd expect Gecko to have a bug that causes it to remove *two* BOMs but not more than that. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Another bug in the HTML parsing spec?
On Tue, Oct 18, 2011 at 3:47 AM, Ian Hickson i...@hixie.ch wrote: 2) I can't get all of the parser tests from html5lib to pass with this algorithm as it is currently written. In particular, there are 5 tests in testdata/tree-construction/tests9.dat of this basic form: !DOCTYPE htmlbodytablemathmifoo/mi/math/table As the spec is written, the mi tag is a text integration point, so the foo text token is handled like regular content, not like foreign content. Oh, my, yeah, that's all kinds of wrong. The text node should be handled as if it was in the in body mode, not as if it was in table. I'll have to study this closer. I think this broke when we moved away from using an insertion mode for foreign content. Henri, do you know how Gecko gets this right currently? The tree builder in Gecko always uses an accumulation buffer that gets flushed when the tree builder sees and end tag token or a start tag token. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] document.write(\r): the spec doesn't say how to handle it.
On Wed, Dec 14, 2011 at 2:00 AM, Ian Hickson i...@hixie.ch wrote: I can remove the text one at a time, if you like. Would that be satisfactory? Or I guess I could change the spec to say that the parser should process the characters, rather than the tokenizer, since really it's the whole shebang that needs to be involved (stream preprocessor and everything). Any opinions on what the right text is here? I'd like the CRLF preprocessing to be defined as an eager stateful operation so that there's one bit of state: last was CR. Then, input is handled as follows: If the input character is CR, set last was CR to true and emit LF. If the input character is LF and last was CR is true, don't emit anything and set last was CR to false. If the input character is LF and last was CR is is false, emit LF. Else set last was CR to false and emit the input character. Where emit feeds into the tokenizer. By eager, I mean that the operation described above doesn't buffer. I.e. the first case emits an LF upon seeing a CR without waiting for an LF also to appear in the input. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
View-Character_Encoding-Auto-Detect-Off ? Anyway: I agree that the encoding menus could be simpler/clearer. I think the most counter-intuitive thing is to use the word auto-detect about the heuristic detection - see what I said above about behaves automatic even when auto-detect is disabled. Opera's default setting is called Automatic selection. So it is all automatic ... Yeah, automatic means different things in different browsers. As for heuristic detection based on the bytes of the page, the only heuristic that can't be disabled is the heuristic for detecting BOMless UTF-16 that encodes Basic Latin only. (Some Indian bank was believed to have been giving that sort of files to their customers and it worked in pre-HTML5 browsers that silently discarded all zero bytes prior to tokenization.) The Cyrillic and CJK detection heuristics can be turned on and off by the user. I always wondered what the Universal detection meant. Is that simply the UTF-8 detection? Universal means that it runs all the detectors that Firefox supports in parallel, so possible guessing space isn't constrained by locale. The other modes constrain the guessing space to a locale. For example, the Japanese detector won't give a Chinese or Cyrillic encoding as its guess. So let's say that you tell your Welsh localizer that: Please switch to WINDOWS-1252 as the default, and then instead I'll allow you to enable this brand new UTF-8 detection. Would that make sense? Not really. I think we shouldn't spread heuristic detection to any locale that doesn't already have it. Within an origin, Firefox considers the parent frame and the previous document in the navigation history as sources of encoding guesses. That behavior is not user-configurable to my knowledge. W.r.t. iframe, then the big in Norway newspaper Dagbladet.no is declared ISO-8859-1 encoded and it includes a least one ads-iframe that is undeclared ISO-8859-1 encoded. * If I change the default encoding of Firefox to UTF-8, then the main page works but that ad fails, encoding wise. Yes, because the ad is different-origin, so it doesn't inherit the encoding from the parent page. * But if I enable the Universal encoding detector, the ad does not fail. * Let's say that I *kept* ISO-8859-1 as default encoding, but instead enabled the Universal detector. The frame then works. * But if I make the frame page very short, 10 * the letter ø as content, then the Universal detector fails - on a test on my own computer, it guess the page to be Cyrillic rather than Norwegian. * What's the problem? The Universal detector is too greedy - it tries to fix more problems than I have. I only want it to guess on UTF-8. And if it doesn't detect UTF-8, then it should fall back to the locale default (including fall back to the encoding of the parent frame). Wouldn't that be an idea? No. The current configuration works for Norwegian users already. For users from different silos, the ad might break, but ad breakage is less bad than spreading heuristic detection to more locales. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Use of media queries to limit bandwidth/data transfer
On Fri, Dec 9, 2011 at 12:10 AM, James Graham jgra...@opera.com wrote: It's not clear that device-width and device-height should be encouraged since they don't tell you anything about how much content area is *actually* visible to the user. Why do media queries support querying the device dimensions? Shouldn't those be changed to be aliases for width and height? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table
On Tue, Dec 13, 2011 at 4:23 AM, Adam Barth w...@adambarth.com wrote: I'm trying to understand how the HTML parsing spec handles the following case: !DOCTYPE htmlbodytablemathmifoo/mi/math/table According to the html5lib test data, we should parse that as follows: | !DOCTYPE html | html | head | body | math math | math mi | foo | table The expectation of the test case makes sense. However, I'm not sure whether that's what the spec actually does. I think that's a spec bug. The net result of which is popping the stack of open elements, but not flushing out the pending table character tokens list. The reason why Gecko does what makes sense is that Gecko uses a text accumulation buffer for non-table cases, too, and any tag token flushes the buffer. (Not quite optimal for ignored tags, sure.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Fri, Dec 9, 2011 at 12:33 AM, Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: Henri Sivonen Tue Dec 6 23:45:11 PST 2011: These localizations are nevertheless live tests. If we want to move more firmly in the direction of UTF-8, one could ask users of those 'live tests' about their experience. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=708995 (which means *other-language* pages when the language of the localization doesn't have a pre-UTF-8 legacy). Do you have any concrete examples? The example I had in mind was Welsh. And are there user complaints? Not that I know of, but I'm not part of a feedback loop if there even is a feedback loop here. The Serb localization uses UTF-8. The Croat uses Win-1252, but only on Windows and Mac: On Linux it appears to use UTF-8, if I read the HG repository correctly. OS-dependent differences are *very* suspicious. :-( I think that defaulting to UTF-8 is always a bug, because at the time these localizations were launched, there should have been no unlabeled UTF-8 legacy, because up until these locales were launched, no browsers defaulted to UTF-8 (broadly speaking). I think defaulting to UTF-8 is harmful, because it makes it possible for locale-siloed unlabeled UTF-8 content come to existence The current legacy encodings nevertheless creates siloed pages already. I'm also not sure that it would be a problem with such a UTF-8 silo: UTF-8 is possible to detect, for browsers - Chrome seems to perform more such detection than other browsers. While UTF-8 is possible to detect, I really don't want to take Firefox down the road where users who currently don't have to suffer page load restarts from heuristic detection have to start suffering them. (I think making incremental rendering any less incremental for locales that currently don't use a detector is not an acceptable solution for avoiding restarts. With English-language pages, the UTF-8ness might not be apparent from the first 1024 bytes.) In another message you suggested I 'lobby' against authoring tools. OK. But the browser is also an authoring tool. In what sense? So how can we have authors output UTF-8, by default, without changing the parsing default? Changing the default is an XML-like solution: creating breakage for users (who view legacy pages) in order to change author behavior. To the extent a browser is a tool Web authors use to test stuff, it's possible to add various whining to console without breaking legacy sites for users. See https://bugzilla.mozilla.org/show_bug.cgi?id=672453 https://bugzilla.mozilla.org/show_bug.cgi?id=708620 Btw: In Firefox, then in one sense, it is impossible to disable automatic character detection: In Firefox, overriding of the encoding only lasts until the next reload. A persistent setting for changing the fallback default is in the Advanced subdialog of the font prefs in the Content preference pane. It's rather counterintuitive that the persistent autodetection setting is in the same menu as the one-off override. As for heuristic detection based on the bytes of the page, the only heuristic that can't be disabled is the heuristic for detecting BOMless UTF-16 that encodes Basic Latin only. (Some Indian bank was believed to have been giving that sort of files to their customers and it worked in pre-HTML5 browsers that silently discarded all zero bytes prior to tokenization.) The Cyrillic and CJK detection heuristics can be turned on and off by the user. Within an origin, Firefox considers the parent frame and the previous document in the navigation history as sources of encoding guesses. That behavior is not user-configurable to my knowledge. Firefox also remembers the encoding from previous visits as long as Firefox otherwise has the page in cache. So for testing, it's necessary to make Firefox forget about previous visits to the test case. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Tue, Dec 6, 2011 at 2:10 AM, Kornel Lesiński kor...@geekhood.net wrote: On Fri, 02 Dec 2011 15:50:31 -, Henri Sivonen hsivo...@iki.fi wrote: That compatibility mode already exists: It's the default mode--just like the quirks mode is the default for pages that don't have a doctype. You opt out of the quirks mode by saying !DOCTYPE html. You opt out of the encoding compatibility mode by saying meta charset=utf-8. Could !DOCTYPE html be an opt-in to default UTF-8 encoding? It would be nice to minimize number of declarations a page needs to include. I think that's a bad idea. We already have *three* backwards-compatible ways to opt into UTF-8. !DOCTYPE html isn't one of them. Moreover, I think it's a mistake to bundle a lot of unrelated things into one mode switch instead of having legacy-compatible defaults and having granular ways to opt into legacy-incompatible behaviors. (That is, I think, in retrospect, it's bad that we have a doctype-triggered standards mode with legacy-incompatible CSS defaults instead of having legacy-compatible CSS defaults and CSS properties for opting into different behaviors.) If you want to minimize the declarations, you can put the UTF-8 BOM followed by !DOCTYPE html at the start of the file. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Mon, Dec 5, 2011 at 8:55 PM, Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: When you say 'requires': Of course, HTML5 recommends that you declare the encoding (via HTTP/higher protocol, via the BOM 'sideshow' or via meta charset=UTF-8). I just now also discovered that Validator.nu issues an error message if it does not find any of of those *and* the document contains non-ASCII. (I don't know, however, whether this error message is just something Henri added at his own discretion - it would be nice to have it literally in the spec too.) I believe I was implementing exactly what the spec said at the time I implemented that behavior of Validator.nu. I'm particularly convinced that I was following the spec, because I think it's not the optimal behavior. I think pages that don't declare their encoding should always be non-conforming even if they only contain ASCII bytes, because that way templates created by English-oriented (or lorem ipsum -oriented) authors would be caught as non-conforming before non-ASCII text gets filled into them later. Hixie disagreed. HTML5 says that validators *may* issue a warning if UTF-8 is *not* the encoding. But so far, validator.nu has not picked that up. Maybe it should. However, non-UTF-8 pages that label their encoding, that use one of the encodings that we won't be able to get rid of anyway and that don't contain forms aren't actively harmful. (I'd argue that they are *less* harmful than unlabeled UTF-8 pages.) Non-UTF-8 is harmful in form submission. It would be more focused to make the validator complain about labeled non-UTF-8 if the page contains a form. Also, it could be useful to make Firefox whine to console when a form is submitted in non-UTF-8 and when an HTML page has no encoding label. (I'd much rather implement all these than implement breaking changes to how Firefox processes legacy content.) We should also lobby for authoring tools (as recommended by HTML5) to default their output to UTF-8 and make sure the encoding is declared. HTML5 already says: Authoring tools should default to using UTF-8 for newly-created documents. [RFC3629] http://dev.w3.org/html5/spec/semantics.html#charset I think focusing your efforts on lobbying authoring tool vendors to withhold the ability to save pages in non-UTF-8 encodings would be a better way to promote UTF-8 than lobbying browser vendors to change the defaults in ways that'd break locale-siloed Existing Content. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Fri, Dec 2, 2011 at 6:29 PM, Glenn Maynard gl...@zewt.org wrote: On Fri, Dec 2, 2011 at 10:46 AM, Henri Sivonen hsivo...@iki.fi wrote: Regarding your (and 16) remark, considering my personal happiness at work, I'd prioritize the eradication of UTF-16 as an interchange encoding much higher than eradicating ASCII-based non-UTF-8 encodings that all major browsers support. I think suggesting a solution to the encoding problem while implying that UTF-16 is not a problem isn't particularly appropriate. :-) ... I don't think I'd call it a bigger problem, though, since it's comparatively (even vanishingly) rare, where untagged legacy encodings are a widespread problem that gets worse every day we can't think of a way to curtail it. From implementation perspective, UTF-16 has its own class of bugs than are unlike other encoding-related bugs and fixing those bugs is particularly annoying because you know that UTF-16 is so rare that you know the fix has little actual utility. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Thu, Dec 1, 2011 at 1:28 AM, Faruk Ates faruka...@me.com wrote: My understanding is that all browsers* default to Western Latin (ISO-8859-1) encoding by default (for Western-world downloads/OSes) due to legacy content on the web. As has already been pointed out, the default depends varies by locale. But how relevant is that still today? It's relevant for supporting the long tail of existing content. The sad part is that the mechanisms that allows existing legacy content to work within each locale silo also makes it possible for ill-informed or uncaring authors to develop more locale-siloed content (i.e. content that doesn't declare the encoding and, therefore, only works when the user's fallback encoding is the same as the author's). I'm wondering if it might not be good to start encouraging defaulting to UTF-8, and only fallback to Western Latin if it is detected that the content is very old / served by old infrastructure or servers, etc. And of course if the content is served with an explicit encoding of Western Latin. I think this would be a very bad idea. It would make debugging hard. Moreover, it would be the wrong heuristic, because well-maintained server infrastructure can host a lot of legacy content. Consider any shared hosting situation where the administrator of the server software isn't the content creator. We like to think that “every web developer is surely building things in UTF-8 nowadays” but this is far from true. I still frequently break websites and webapps simply by entering my name (Faruk Ateş). For things to work, the server-side component needs to deal with what gets sent to it. ASCII-oriented authors could still mishandle all non-ASCII even if Web browsers forced them to deal with UTF-8 by sending them UTF-8. Furthermore, your proposed solution wouldn't work for legacy software that correctly declares an encoding but declared a non-UTF-8 encoding. Sadly, getting sites to deal with your name properly requires the developer of each site to get a clue. :-( Just sending form submissions in UTF-8 isn't enough if the recipient can't deal. Compare with http://krijnhoetmer.nl/irc-logs/whatwg/20110906#l-392 Yes, I understand that that particular issue is something we ought to fix through evangelism, but I think that WHATWG/browser vendors can help with this while at the same time (rightly, smartly) making the case that the web of tomorrow should be a UTF-8 (and 16) based one, not a smorgasbord of different encodings. Anne has worked on speccing what exactly the smorgasbord should be. See http://wiki.whatwg.org/wiki/Web_Encodings . I think it's not realistic to drop encodings that are on the list of encodings you see in the encoding menu on http://validator.nu/?charset However, I think browsers should drop support for encodings that aren't already supported by all the major browsers, because such encodings only serve to enable browser-specific content and encoding proliferation. Regarding your (and 16) remark, considering my personal happiness at work, I'd prioritize the eradication of UTF-16 as an interchange encoding much higher than eradicating ASCII-based non-UTF-8 encodings that all major browsers support. I think suggesting a solution to the encoding problem while implying that UTF-16 is not a problem isn't particularly appropriate. :-) So hence my question whether any vendor has done any recent research in this. Mobile browsers seem to have followed desktop browsers in this; perhaps this topic was tested and researched in recent times as part of that, but I couldn't find any such data. The only real relevant thread of discussion around UTF-8 as a default was this one about Web Workers: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-September/023197.html …which basically suggested that everyone is hugely in favor of UTF-8 and making it a default wherever possible. So how 'bout it? I think in order to comply with the Support Existing Content design principle (even if it unfortunately means that support is siloed by locale) and in order to make plans that are game theoretically reasonable (not taking steps that make users migrate to browsers that haven't taken the steps), I think we shouldn't change the fallback encodings from what the HTML5 spec says when it comes to loading text/html or text/plain content into a browsing context. What's going in this area, if anything? There's the effort to specify a set of encodings and their aliases for browsers to support. That's moving slowly, since Anne has other more important specs to work on. Other than that, there have been efforts to limit new features to UTF-8 only (consider scripts in Workers and App Cache manifests) and efforts to make new features not vary by locale-dependent defaults (consider HTML in XHR). Both these efforts have faced criticism, unfortunately. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Default encoding to UTF-8?
On Thu, Dec 1, 2011 at 8:29 PM, Brett Zamir bret...@yahoo.com wrote: How about a Compatibility Mode for the older non-UTF-8 character set approach, specific to page? That compatibility mode already exists: It's the default mode--just like the quirks mode is the default for pages that don't have a doctype. You opt out of the quirks mode by saying !DOCTYPE html. You opt out of the encoding compatibility mode by saying meta charset=utf-8. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] createContextualFragment in detached contexts
On Fri, Sep 30, 2011 at 7:56 PM, Erik Arvidsson a...@chromium.org wrote: On Fri, Sep 30, 2011 at 07:35, Henri Sivonen hsivo...@iki.fi wrote: On Fri, Sep 30, 2011 at 1:37 AM, Erik Arvidsson a...@chromium.org wrote: If the context object is in a detached state, then relax the parsing rules so that all elements are allowed at that level. The hand wavy explanation is that for every tag at the top level create a new element in the same way that ownerDocument.createElement would do it. I would prefer not to add a new magic mode to the parsing algorithm that'd differ from what innerHTML requires. So you want every js library to have to do this kind of work around instead? This topic has migrated to public-webapps. My current thinking is http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/0818.html -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Allowing custom attributes on html or body in documents for media resources
On Thu, Nov 10, 2011 at 2:03 AM, Robert O'Callahan rob...@ocallahan.org wrote: http://www.whatwg.org/specs/web-apps/current-work/#read-media Can we allow the UA to add custom vendor-prefixed attributes to the html and/or body elements? Alternatively, a vendor-prefixed class? We want to be able to use a style sheet with rules matching custom attributes to indicate various situations (e.g., whether the document is a toplevel browsing context) to set the viewport background. Why can't non-prefixed attributes be minted for these use cases? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] document.write(\r): the spec doesn't say how to handle it.
On Thu, Nov 3, 2011 at 8:13 PM, David Flanagan dflana...@mozilla.com wrote: Each tokenizer state would have to add a rule for CR that said emit LF, save the current tokenizer state, and set the tokenizer state to after CR state. The Validator.nu/Gecko tokenizer returns a last input code unit processed was CR flag to the caller. If the tokenizer sees a CR, the tokenizer processes it and returns to the caller immediately with the flag set to true. The caller is responsible for checking if the next input code unit is an LF, skipping over it and calling the tokenizer again. This way, the tokenizer itself does not need to have the capability of skipping over a character and the same capabilities that are normally used for dealing with arbitrary buffer boundaries and early returns after script end tags (or timers before the parser moved off the main thread) work. The parser operates on UTF-16 code units, so a lone surrogate is emitted. The spec seems pretty unambiguous that it operates on codepoints The spec is empirically wrong. The wrongness has been reported. The spec tries to retrofit Unicode theoretical purity onto legacy where no purity existed. The tokenizer operates on UTF-16 code units. document.write() feeds UTF-16 code units to the tokenizer without lone surrogate preprocessing. The tokenizer or the tree builder don't do anything about lone surrogates. When consuming a byte stream, the converter that converts (potentially unaligned and potentially foreign byte-order) the UTF-16-encoded byte stream into a stream of UTF-16 code units is responsible for treating unpaired surrogates as conversion errors. Sorry about not mentioning earlier that the problematic tests are also problematic in this sense. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] document.write(\r): the spec doesn't say how to handle it.
On Thu, Nov 3, 2011 at 1:57 AM, David Flanagan dflana...@mozilla.com wrote: Firefox, Chrome and Safari all seem to do the right thing: wait for the next character before tokenizing the CR. See http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1247 Firefox tokenizes the CR immediately, emits an LF and then skips over the next character if it is an LF. When I designed the solution Firefox uses, I believed it was more correct and more compatible with legacy than whatever the spec said at the time. Chrome seems to wait for the next character before tokenizing the CR. And I think this means that the description of document.write needs to be changed. All along, I've felt thought that having U+ and CRLF handling as a stream preprocessing step was bogus and both should happen upon tokenization. So far, I've managed to convince Hixie about U+ handling. Similarly, what should the tokenizer do if the document.write emits half of a UTF-16 surrogate pair as the last character? The parser operates on UTF-16 code units, so a lone surrogate is emitted. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Signed XHTML
On Thu, Oct 20, 2011 at 9:57 PM, Martin Boßlet martin.boss...@googlemail.com wrote: Are there plans in this direction? Would functionality like this have a chance to be considered for the standard? The chances are extremely slim. XML signatures depend on XML canonicalization which is notoriously difficult to implement correctly and suffers from interop problems because unmatched sets of bugs in the canonicalization phase make signature verification fail. I think browser vendors would be reasonable if they resisted making XML signatures of canonicalization part of the platform. Moreover, most of the Web is HTML, so enthusiasm for XHTML-only features is likely very low these days. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] [fullscreen] Drop requestFullScreenWithKeys()?
On Wed, Oct 12, 2011 at 11:56 AM, Anne van Kesteren ann...@opera.com wrote: Given the way Mac OS handles full screen applications I wonder whether requestFullScreenWithKeys() is needed. A toolbar will always appear at the top if you locate your cursor there. Does the user realize that? Can the user do that if a mouse lock API is used also? Can a user without a pointing device do that? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] createContextualFragment in detached contexts
On Fri, Sep 30, 2011 at 1:37 AM, Erik Arvidsson a...@chromium.org wrote: If the context object is in a detached state, then relax the parsing rules so that all elements are allowed at that level. The hand wavy explanation is that for every tag at the top level create a new element in the same way that ownerDocument.createElement would do it. I would prefer not to add a new magic mode to the parsing algorithm that'd differ from what innerHTML requires. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] input type=barcode?
On Wed, 2011-08-03 at 17:21 +0200, Anne van Kesteren wrote: On Wed, 03 Aug 2011 16:52:03 +0200, Mikko Rantalainen mikko.rantalai...@peda.net wrote: What do you think? Implementing this seems rather complicated for such a niche use. It also seems better to let sites handle this by themselves so these physical codes can evolve more easily. I don't know how niche thing it is to actually own a dedicated USB barcode reader, but where I live, using at least one Web app that supports bar code reading (by having a text input requiring the bar code reader can emulate a keyboard) is as mainstream as Web app usage gets (banking). -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Why children of datalist elements are barred from constraint validation?
On Fri, 2011-07-29 at 15:20 -0700, Jonas Sicking wrote: On Fri, Jul 29, 2011 at 2:59 PM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Jul 29, 2011 at 5:51 PM, Jonas Sicking jo...@sicking.cc wrote: On Fri, Jul 29, 2011 at 9:43 AM, Ian Hickson i...@hixie.ch wrote: Looking specifically at datagrid's ability to fall back to select, I agree that it's not necessarily doing to be widely used, but given that it's so simple to support and provides such a clean way to do fallback, I really don't see the harm in supporting it. I haven't looked at datagrid yet, so I can't comment. I think he meant datalist. datagrid was axed quite some time ago and hasn't made a reappearance that I know of. Ah, well, then it definitely seems like we should get rid of this feature. The harm is definitely there in that it's adding a feature without solving any problem. The current design solves the problem that the datalist feature needs to Degrade Gracefully (and preferably without having to import a script library). I think the solution is quite elegant and don't see a need to drop it. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] sic element
On Fri, 2011-07-29 at 22:39 +, Ian Hickson wrote: If it's ok if it's entirely ignored, then it's presentational, and not conveying any useful information. Presentational markup may convey useful information, for example that a quotation from printed matter contains an underlined word. HTML is the wrong language for this kind of thing. I disagree. From time to time, people want to take printed matter an publish it on the Web. In practice, the formats available are PDF and HTML. HTML works more nicely in browsers and for practical purposes works generally better when the person taking printed matter to the Web decides that the exact line breaks and the exact font aren't of importance. They may still consider it of importance to preserve bold, italic and underline and maybe even delegate that preservation to OCR software that has no clue about semantics. (Yes, bold, italic and underline are qualitatively different from line breaks and the exact font even if you could broadly categorize them all as presentational matters.) I think it's not useful for the Web for you to decree that HTML is the wrong language for this kind of thing. There's really no opportunity to launch a new format precisely for that use case. Furthermore, in practice, HTML already works fine for this kind of thing. The technical solution is there already. You just decree it wrong as a matter of principle. When introducing new Web formats is prohibitively hard and expensive, I think it doesn't make sense to take the position that something that already works is the wrong language. I think you are confused as to the goals here. The presentational markup that was u, i, b, font, small, etc, is gone. I think the reason why Jukka and others seem to be confused about your goals is that your goals here are literally incredible from the point of view of other people. Even though you've told me f2f what you believe and I want to trust that you are sincere in your belief, I still have a really hard time believing that you believe what you say you believe about the definitions of b, i and u. When after discussing this with you f2f, I still find your position incredible, I think it's not at all strange if other people when reading the spec text interpret your goals inaccurately because your goals don't seem like plausible goals to them. If if the word presentational carries too much negative baggage, I suggest defining b, i and u as typographic elements on visual media (and distinctive elements on other media) and adjusting the rhetoric that HTML is a semantic markup language to HTML being a mildly semantic markup language that also has common phrase-level typographic features. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Support for RDFa in HTML5
On Tue, 2011-08-02 at 13:55 +, aykut.sen...@bild.de wrote: I would like to know if these attributes will be part of HTML5 or is there another valid method to integrate RDFa into HTML5? Why do you need RDFa? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] date meta-tag invalid
On Tue, 2011-07-26 at 11:27 +, aykut.sen...@bild.de wrote: http://www.google.com/support/news_pub/bin/answer.py?answer=93994 See Link above, Google says, that they provide DC.date.issued, but this is also not part auf the whatwg metaextensions list. It's part of the list now. I wonder what possessed the Google News team to use dc.date.issued instead of dc.issued or dcterms.issued. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] date meta-tag invalid
On Mon, 2011-07-18 at 13:59 +, aykut.sen...@bild.de wrote: i have asked one from the seo team and he says for example the freshness factor is important for google. Is there evidence of meta name=date content=... being part of Google's freshness factor? Is there public documentation explaining what meta name=date content=... means, what date format expected in the content attribute is and what software does something useful with it? is it possible to use the time-tag in the head instead (i mean invisible)? No, it's not. dc:created is also not in the Meta Extensions List, see: http://wiki.whatwg.org/wiki/MetaExtensions It simply hasn't been registered yet. Is there any evidence of consuming software that does something useful with dc:created? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Microdata feedback
On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote: The JSON algorithm now ends the crawl when it hits a loop, and replaces the offending duplicate item with the string ERROR. The RDF algorithm preserves the loops, since doing so is possible with RDF. Turns out the algorithm almost did this already, looks like it was an oversight. It seems to me that this approach creates an incentive for people who want to do RDFesque things to publish deliberately non-conforming microdata content that works the way they want for RDF-based consumers but breaks for non-RDF consumers. If such content abounds and non-RDF consumers are forced to support loopiness but extending the JSON conversion algorithm in ad hoc ways, part of the benefit of microdata over RDFa (treeness) is destroyed and the benefit of being well-defined would be destroyed, too, for non-RDF consumption cases. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] Readiness of script-created documents
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1039 It says complete in Firefox, loading in Chrome and Opera and uninitialized in IE. The spec requires complete. readyState is originally an IE API. Why doesn't the spec require uninitialized? (The implementation in Gecko is so recent that it's quite possible that Gecko followed the spec and the spec just made stuff up as opposed to the spec following Gecko.) -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] Linking to the HTML accessibility API mapping draft
It's generally safe to assume that the WHATWG spec doesn't suppress useful information even though W3C publications occasionally might. However, in the case of http://dev.w3.org/html5/html-api-map/overview.html the WHATWG spec suppresses the document from the Recommended Reading section. This reduces one's ability to trust that if one reads the WHATWG spec, information isn't suppressed. While I realize that the API mapping document isn't even nearly done yet, is it so incorrect that it's more useful not to let people know that exists than to link to it alongside the Polyglot guide (which, I imagine, isn't recommended reading in the sense of recommending that the Polyglot guide be followed for Web authoring)? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] window.cipher HTML crypto API draft spec
On Tue, 2011-05-24 at 07:48 -0700, David Dahl wrote: Consider for example a DropBox-style service that has a browser-based UI but that has a design where content is encrypted on the client-side so that the service provider is unable to decrypt the data. In this case, it would make sense to be able to implement a file download by having a plain a href to an excrypted file and have the browser automatically decrypt it. Likewise, a service that allows the transmission of encrypted images should be implementable by having img src point directly to an encrypted file. I think someone was asking about that kind of functionality during my presentation at Mozilla. Again, this would be a pretty advanced complement to this API - I would love to see something like that spec'd and implemented as well. My main worry is that if the two ways of doing crypto don't appear at the same time for Web authors to use, the Web will shift in an unfortunately hashbangy direction. I suggest adding a Content-Encoding type that tells the HTTP stack that the payload of a HTTP response is encrypted and needs to be decrypted using a key previously initialized using the JS API. cool. I'll look into that. Thanks. On the other hand, it seems that letting Web apps generate per-user key pairs and letting Web apps discover if the user possesses the private key that decrypts a particular message is a privacy problem. Someone who wishes to surveil Web users could use private keys as supercookies, since the generated private key is most probably going to be unique to user. Currently, my implementation requires the enduser to open a file from the file system in order to view the contents of the private key. It is only accessible to privileged code - content has no access to it whatsoever. I didn't expect content to have access to the key bits per se. I expected Web content-provided JS to be able to encrypt and decrypt stuff with a key it has asked the browser to generate (if the user has authorized the origin to use the crypto API). The ability to decrypt or encrypt a message with a particular private key is proof of possession of that key, so users in possession of a particular key could be tracked. This could be mitigated by granting the crypto permissions to a pair of origins: the origin of the top level frame combined with the origin that wants to access the API. This way iframed Web bugs could track the user across sites after having once obtained a crypto permission for their origin. See http://www.w3.org/2010/api-privacy-ws/papers/privacy-ws-24.pdf Currently, it is unfortunate that choosing to use a webmail client effectively prevents a person from using encrypted email. To allow people to use end-to-end encrypted email with webmail apps, it would be useful to support OpenPGP as an encryption format. (Obviously, a malicious webmail app could capture the decrypted messages on the browser and send them back to the server, but when the webmail app itself doesn't contain code like that, putting the decryption in the browser rather than putting it on the server would still probably be more subpoena-resistant and resistant against casual snooping by bored administrators.) I think with an API like this we might see a whole new breed of communications applications that can supplant email and webmail entirely. Maybe. But Google Wave flopped and email is still here. I think it would be good to design for the ability to plug into email's network effects instead of counting on a new breed of communication making email irrelevant. The public key discovery section shows a /meta end tag. I hope this is just a plain error and having content in a meta element isn't part of any design. The tag is unimportant as well - can you explain why you hope this wil not use a meta tag? A meta tag can be used if there's no need for the meta element to have child nodes. You can't make a meta element have child nodes or an end tag. I could just as easily use addressbookentry You can't introduce an addressbookentry element as a child of the head element. The result would not degrade gracefully. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] Please link to a specific fragment id on the microformats.org wiki
It has been brought to my attention that linking to the microformats.org wiki from http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#other-link-types without a specific fragment id confuses people because the wiki page includes stuff that doesn't constitute keyword registrations for HTML(5) purposes. I believe changing the link to point to http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions would improve the usability of the registration procedure. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] DOMContentLoaded, load and current document readiness
Recently, there was discussion about changing media element state in the same task that fires the event about the state change so that scripts that probe the state can make non-racy conclusions about whether a certain event has fired already. Currently, there seems to be no correct non-racy way to write code that probes a document to determine if DOMContentLoaded or load has fired and runs code immediately if the event of interest has fired or adds a listener to wait for the event if the event hasn't fired. Are there compat or other reasons why we couldn't or shouldn't make it so that the same task that fires DOMContentLoaded changes the readyState to interactive and the same task that fires load changes readyState to complete? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] CORS requests for image and video elements
On Tue, 2011-05-17 at 14:25 -0700, Kenneth Russell wrote: Unfortunately, experimentation indicates that it is not possible to simply send CORS' Origin header with every HTTP GET request for images; some servers do not behave properly when this is done. How do they behave? Which servers? Why? Has evangelism been attempted? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Full Screen API Feedback
On May 13, 2011, at 19:17, Eric Carlson wrote: I don't know of exploits in the wild, but I've read about proof-of-concept exploits that overwhelmed the user's attention visually so that the user didn't notice the Press ESC to exit full screen message. This allowed subsequent UI spoofing. (I was unable to find the citation for this.) Maybe you were thinking of this: http://www.bunnyhero.org/2008/05/10/scaring-people-with-fullscreen/. I'm not sure if that's the exact demo I have seen before, but it uses the same idea as the demo I've seen before. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/