from:"Henri Sivonen"

[whatwg] Update on fallback encoding findings

2014-05-09 Thread Henri Sivonen

A while ago, Hixie pinged me on IRC to ask if there are any news about
the character encoding stuff. While there are no news yet about
guessing the fallback encoding from the TLD of the site, there are now
some news about guessing the fallback encoding from the locale.

Data for Firefox 25:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393

Data for Firefox 26:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394

Data for Firefox 27:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8420031

Data for Firefox 28:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8420032

Specific findings:
 1) Prior to Firefox 28, Traditional Chinese Firefox had a bug that
caused the fallback to be UTF-8. Changing the fallback to Big5 in
Firefox 28 reduced the usage of the Character Encoding menu. (Please
note, however, that Firefox's notion of Big5 does not yet comply with
the Encoding Standard notion of Big5.)

 2) Prior to Firefox 28, Thai Firefox had a bug that caused the
fallback to be windows-1252. Changing the fallback to windows-874 in
Firefox 28 reduced the usage of the Character Encoding menu.

There were also other locales that had their fallback corrected per
spec in Firefox 28. However, for those locales, the changes were
within the variation seen between releases previously.

I think the finding about Traditional Chinese supports the conclusion
that we should not fall back to UTF-8 everywhere. I think the finding
about Thai supports a conclusion that we should not fall back on
windows-1252 everywhere. However, the results being in the noise for
some locales that had their fallback changes suggest that the labeling
practice isn't uniform around the world and some locales are relying
on the fallback less than others.

Since locales using a non-Latin script are the leaders in Character
Encoding menu use even when there's only one dominant legacy encoding
within the locale, it seems that there is a continued tension between
the locale-specific fallback and fallback to windows-1252. Guessing
the fallback from the TLD is supposed to address this. I will report
findings once the TLD guessing has been on the release channel for six
weeks.

Also, the relatively high level of Character Encoding menu use for the
Korean locale continues to puzzle me. From looking at the mere
structure of the legacy or the neighboring locales being different,
one should expect the situation with the Korean locale and the Hebrew
locales to be very similar. Yet, it is not.

Finally worth noting: Firefox is committing a willful violation of the
spec when it comes to Simplified Chinese: The spec says gb18030, but
Firefox uses gbk. Starting with Firefox 29, the gbk *decoder* will be
the same as the gb18030 decoder. However, because we've previously
seen problems with EUC-JP and Big5 when expanding the range of byte
sequences that an *encoder* can produce in form submission, we are
keeping the gbk encoder distinct from the gb18030 at least for now.
I'm willing to reconsider if another browser (that has high market
share in China) successfully starts using the gb18030 encoder for form
submissions for sites that declare gbk (or gb2312) or don't declare an
encoding.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/

Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

2014-02-26 Thread Henri Sivonen

On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote:
 What have you learnt so far?

I've learned that I've misattributed the cause of high frequency of
character encoding menu usage in the case of the Traditional Chinese
localization. We've been shipping after the wrong fallback encoding
(UTF-8) even after the fallback encoding was supposedly fixed (to
Big5). Shows what kind of a mess our previous mechanism for setting
the fallback encoding in a locale-dependent way was. The fallback
encoding for Traditional Chinese will change to Big5 for real in
Firefox 28.

I might have improved (hopefully; to be seen still) Firefox for the
wrong reason. Oops. :-)

Also, more baseline telemetry data (i.e. data without TLD-based
guessing) is now available. The last 3 weeks of Firefox 25 on the
release channel:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393 . The
last 3 weeks of Firefox 26 on the release channel:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394 . The
rows for locales with such little usage overall but even a couple of
sessions with the encoding menu use puts them of the list
percentage-wise are grayed. In both cases, the top entries in black
are Traditional Chinese and Thai, both of which have the wrong
fallback. Up next are CJK followed by the Cyrillic locales that have a
detector on by default (Russian and Ukrainian), which makes one wonder
if the detectors are doing more harm than good. Up next is Arabic,
which has the wrong fallback. (These wrong fallbacks are fixed in
Firefox 28. In Firefox 28, no locale falls back to UTF-8.)

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/

Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

2014-02-08 Thread Henri Sivonen

On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote:
The correlation should be at least as high, as far as I can tell.

Logically, yes, for most parts of the world.

Or maybe a 50%/50% experiment
with that as the first 50% and the default coming from the TLD instead of
the UI locale in the second 50%, with the corresponding instrumentation,
to see how the results compare.

Mozilla doesn't have a proper A/B testing infrastructure yet. I expect
the A to be Firefox 29 on the release channel and B to be Firefox 30
on the release channel. So unless this gets backed out, I expect to
have data around the time of Firefox 31 going to release.

Have you tried deploying this?

It is on Firefox trunk now. However, not all country TLDs are
participating. I figured it is better to leave unsure cases the way
they were. It doesn't make sense to put a lot of effort into
researching those before seeing if the general approach works for the
case that it was designed for, specifically Traditional Chinese. The
success metric I expect to be looking at is if the usage of the
character encoding menu in the Traditional Chinese localization of
Firefox falls to the same level as in other Firefox localizations in
general.

If this change turns out to be successful for Traditional Chinese,
then I think it will be worthwhile to research the unobvious cases.

The TDLs listed in
https://mxr.mozilla.org/mozilla-central/source/dom/encoding/nonparticipatingdomains.properties
do not participate at present (i.e. get a browser UI
localization-based guess like before). The TLDs listed in
https://mxr.mozilla.org/mozilla-central/source/dom/encoding/domainsfallbacks.properties
get the fallbacks listed in that file. All other TLDs map to
windows-1252.

What have you learnt so far?

It hasn't been an obvious and immediate disaster.

--
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/

[whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

2013-12-19 Thread Henri Sivonen

-8 is never guessed, so this feature doesn't give anyone who
uses UTF-8 a reason not to declare it. In that sense, this feature
doesn't interfere with the authoring methodology sites should,
ideally, be adhering to.

# How could this be harmful?

 * This could emphasize pre-existing brokenness (failure to declare
the encoding) of sites targeted at language minorities when the legacy
encoding for the minority language doesn't match the legacy encoding
for the majority language of the country and 100% of the target
audience of the site uses a browser localization that matches the
language of the site. For example, it's *imaginable* that there exists
a Russian-language windows-1251-encoded (but not declared) site under
.ee that's currently always browsed with Russian browser
localizations. More realistically, minority-language sites whose
encoding doesn't match the dominant encoding of the country probably
can't be relying on their audience using a particular browser
localization and are probably more aware than most about encoding
issues and already declare their encoding, so I'm not particularly
worried about this scenario being a serious problem. And sites can
always fix things by declaring their encoding.

 * This could cause some breakage when unlabeled non-windows-1252
sites are hosted under a foreign TLD, because the TLD looks cool (e.g.
.io). However, this is a relatively new phenomenon, so one might hope
that there's less content authored according to legacy practices
involved.

 * This probably lowers the incentive to declare the legacy encoding a little.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/

[whatwg] Add an attribute for opting out of synchronous CSSOM access

2013-04-05 Thread Henri Sivonen

For problem statement, please see
http://lists.w3.org/Archives/Public/www-style/2013Jan/0434.html

For solution, please see
http://lists.w3.org/Archives/Public/www-style/2013Jan/0457.html

For CSS WG thinking that this is an HTML issue, please see
http://lists.w3.org/Archives/Public/www-style/2013Mar/0688.html
(FWIW, I think this is a CSS issue than requires an HTML attribute to
be minted.)

Please add an attribute to link that:
 * opts an external style sheet out of synchronous CSSOM access
 * makes the sheet not load and not defer the load event if its media
query cannot match in the UA even after zooming or invoking a print
function
 * makes the sheet load with low priority and not defer the load event
if its media query does not match at the time the link element is
inserted into the document but might match later (e.g. if it's a print
style sheet).

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason

2013-03-26 Thread Henri Sivonen

In various places that deal with encoding labels, the HTML spec now
requires authors to use the name of the encoding from the Encoding
Standard, which means using the preferred name rather than an alias.

Compared to the previous reference to the IANA registry, some names
that work in all browsers but are no longer preferred names are now
errors, such as iso-8859-1 and tis-620. Making broadly-supported names
that were previously preferred names according to IANA now be errors
does not appear to provide any utility to Web authors who use
validators.

Please relax the requirement so that at least previously-preferred
names are not errors.

zcorpan suggested
(http://krijnhoetmer.nl/irc-logs/whatwg/20130325#l-920) allowing
non-preferred names for non-UTF-8 encodings. I'm not familiar with the
level of browser support for all of the non-preferred aliases, but I
could accept zcorpan's suggestion.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] menu and friends

2013-01-14 Thread Henri Sivonen

On Wed, Jan 9, 2013 at 10:17 PM, Ian Hickson i...@hixie.ch wrote:
 Optimising for the short-term shim author's experience rather than the
 long-term HTML authoring experience seems backwards to me.

After input from a couple of other Gecko developers, I withdraw my
objection to menuitem being void.

 As for command behavior in the parser, all major browsers have shipped
 releases with command as void, so we won't be able to reliably
 introduce a non-void element called command in the future anyway.
 Therefore, I don't see value in removing the voidness of command from
 parsing or serialization.

 The element doesn't exist, so there's no value in having it. We can easily
 introduce a non-void command in ten years if we need to, since by then
 the current parsers will be gone.

Even if we accept, for the sake of the argument, that the current
parsers will be gone in 10 years, it is incorrect to consider only
parsers. Considering serializers is also relevant. The voidness of
command has already propagated to various places—including
serializer specs like
http://www.w3.org/TR/xslt-xquery-serialization-30/ . (No doubt the
XSLT folks will be super-happy when we tell them that the list of void
elements has changed again.)

At any point of the future, it is more likely that picking a new
element name for a newly-minted non-void element will cause less
(maybe only an epsilon less but still less) hassle than trying to
re-introduce command as non-void. Why behave as if finite-length
strings were in short supply? Why not treat command as a burned name
just like legend and pick something different the next time you need
something of the same theme when interpreted as an English word?

What makes an element exist for you? Evidently, basefont and bgsound
exist enough to get special parsing and serialization treatment. Is
multiple interoperable parsing and serialization implementations not
enough of existence and you want to see deployment in existing
content, too? Did you measure the non-deployment of command on the
Web or are we just assuming it hasn't been used in the wild? Even if
only a few authors have put command in head, changing parsing to
make command break out of head is bad.

What do we really gain except for test case churn, makework in code
and potential breakage from changing command as opposed to treating
it as a used-up identifier and minting a new identifier in the future
if a non-void element with a command-like name is needed in the
future?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] We should not throw DOM Consistency and Infoset compatibility under the bus

2013-01-14 Thread Henri Sivonen

On Fri, Jan 11, 2013 at 10:00 PM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 11 Jan 2013, Henri Sivonen wrote:
 I understand that supporting XML alongside HTML is mainly a burden for
 browser vendors and I understand that XML currently doesn't get much
 love from browser vendors.

 Not just browser vendors. Authors rarely if ever use XML for HTML either.

When you say use XML, do you mean serving content using an XML content type?

I'm talking about serving text/html but using XML machinery to
generate it (with a text/html-aware serializer at the end of the
process).

 Still, I think that as long as browsers to support XHTML, we'd be worse
 off with the DOM-and-above parts of the HTML and XML implementations
 diverging.

 Sure, but if on the long term, or even medium term, they don't continue to
 support XHTML, this is no longer a problem.

But if they do continue to support XHTML, introducing divergence will
be a problem and, moreover, a problem that may become unfixable. (That
we were able to converge on the namespace was narrow enough a success.
It broke Facebook!)

 Anyway, I'm not suggesting that they diverge beyond the syntax (which is
 already a lost cause). All I've concretely proposed is syntax for binding
 Web components in text/html; I haven't described how this should be
 represented in the DOM, for instance. If we define foo/bar as being a
 text/html syntactic shorthand for foo xml:component=bar, or foo
 xmlcomponent=bar, in much the same way as we say that svg is a
 shorthand for svg xmlns=http://www.w3.org/2000/svg;, then the DOM
 remains the same for both syntaxes, and (as far as I can tell) we're fine.

I didn't realize you were suggesting that HTML parsers in browsers
turned bar/foo into bar xml:component=foo in the DOM. How is
xml:component=foo better than is=foo? Why not bar foo=, which
is what bar/foo parses into now? (I can think of some reasons
against, but I'd like to hear your reasons.)

 The idea to stick a slash into the local name of an element in order to
 bind Web Components is much worse.

 I don't propose to change the element's local name. select/map has
 tagName select in my proposal.

Oh. That was not at all clear.

 Please, let's not make that mistake.

 What do you propose to resolve this problem then?

Let's focus on the requirements before proposing solutions.

 Some of the constraints are:

  - The binding has to be done at element creation time
  - The binding has to be immutable during element lifetime
  - The syntax must not make authors think the binding is mutable
(hence why the select is=map proposal was abandoned)

“Was abandoned”? Already “abandoned”? Really?

How does xml:component=map suggest mutability less than is=map?

Would it be terrible to make attempts to mutate the 'is' attribute
throw thereby teaching authors who actually try to mutate it that it's
not mutable?

  - The syntax must be as terse as possible
  - The syntax has to convey the element's public semantics (a
specified HTML tag name) in the document markup, for legacy UAs
and future non-supporting UAs like spiders.

- It must be possible to generate the syntax using a serializer that
exposes (only) the SAX2 ContentHandler interface to an XML system and
generates text/html in response to calls to the methods of the
ContentHandler interface and the XML system may enforce the calls to
ContentHandler representing a well-formed XML document (i.e. would
produce a well-formed XML doc if fed into an XML serializer). The
syntax must round-trip if the piece of software feeding the serializer
is an HTML parser that produces SAX2 output in a way that's consistent
with the way the parsing spec produces DOM output. (This is a concrete
way to express “must be producable with Infoset-oriented systems
without having a different Infoset mapping than the one implied by the
DOM mapping in browsers”. As noted, dealing with template already
bends this requirement but in a reasonably straightforward way.)
- It must be possible to generate the syntax with XSLT. (Remember, we
already have !DOCTYPE html SYSTEM about:legacy-compat, because
this is important enough a case.)

Adding these requirements to your list of requirements may make the
union of requirements internally contradictory. However, I think we
should have a proper discussion of how to reconcile contradictory
requirements instead of just conveniently trimming the list of
requirements to fit your proposed solution. (For example, it could be
that one of my requirements turns out to be more important than one of
yours.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] We should not throw DOM Consistency and Infoset compatibility under the bus

2013-01-11 Thread Henri Sivonen

 as HTML fits
into the XML data model.

I think it would be a mistake to change HTML in such a way that it
would no longer fit into the XML data model *as implemented* and
thereby limit the range of existing software that could be used
outside browsers for working with HTML just because XML in browsers is
no longer in vogue. Please, let's not make that mistake.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] menu and friends

2013-01-08 Thread Henri Sivonen

On Sat, Dec 29, 2012 at 3:23 AM, Ian Hickson i...@hixie.ch wrote:
 * menuitem is void (requires parser changes).

 * command is entirely gone. (Actually, I renamed command to menuitem
 and made it so it's only allowed in menu.)

Did you actually make these changes to the parsing algorithm? It seems
to me that you didn't, and I'm happy that you didn't.

Currently, menuitem is non-void in Firefox. It was initially designed
to be void but that never shipped and the non-voidness is, AFAIK,
considered intentional. For one thing, being non-void makes the
element parser-neutral and, therefore, easier to polyfill in
menuitem-unaware browsers.

As for command behavior in the parser, all major browsers have
shipped releases with command as void, so we won't be able to
reliably introduce a non-void element called command in the future
anyway. Therefore, I don't see value in removing the voidness of
command from parsing or serialization.

Could you, please, revert the serializing algorithm to treat command
as void and menuitem as non-void?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-12 Thread Henri Sivonen

On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote:
 the order between abc and
 xyz is reversed in the tree.

 Does anyone have any preference for how this is fixed?

Does it need to be fixed? That is, is it breaking real sites?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] [mimesniff] Sniffing archives

2012-12-04 Thread Henri Sivonen

On Tue, Dec 4, 2012 at 9:40 AM, Adam Barth w...@adambarth.com wrote:
 Also, some user agents treat downloads of
 ZIP archives differently than other sorts of download (e.g., they
 might offer to unzip them).

Which user agents? For this use case, merely sniffing for the zip
magic number is inadequate, because you really don’t want to offer to
unzip EPUB, ODF, OOXML, XPS, InDesign, etc. files.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Loading and executing script as quickly as possible using multipart/mixed

2012-12-04 Thread Henri Sivonen

On Tue, Dec 4, 2012 at 4:15 AM, Kyle Simpson get...@gmail.com wrote:
 One suggestion is to added a state to the readyState mechanism like 
 chunkReady, where the event fires and includes in its event object 
 properties the numeric index, the //@sourceURL, the separator identifier, or 
 otherwise some sort of identifier for which the author can tell which chunk 
 executed.

If the script author needs to manually designate the chunk boundaries,
can’t the script authors insert a call to a function before each
boundary? That is, why is it necessary for the UA to generate events?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] [mimesniff] Handling container formats like Ogg

2012-11-27 Thread Henri Sivonen

On Tue, Nov 27, 2012 at 12:59 AM, Gordon P. Hemsley gphems...@gmail.com wrote:
 Container formats like Ogg can be used to store many different audio
 and video formats, all of which can be identified generically as
 application/ogg. Determining which individual format to use (which
 can be identified interchangeably as the slightly-less-generic
 audio/ogg or video/ogg, or using a 'codecs' parameter, or using a
 dedicated media type) is much more complex, because they all use the
 same OggS signature. It would requiring actually attempting to parse
 the Ogg container to determine which audio or video format it is using
 (perhaps not unsimilar to what is done for MP4 video and what might
 have to be done with MP3 files without ID3 tags).

 Would this be something UAs would prefer to handle in their Ogg
 library, or should I spec it as part of sniffing?

What would be the use case for handling it as part of sniffing layer?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] main element parsing behaviour

2012-11-07 Thread Henri Sivonen

On Wed, Nov 7, 2012 at 2:42 PM, Simon Pieters sim...@opera.com wrote:
 I think we shouldn't put the parsing algorithm on a pedestal while not
 giving the same treatment to the default UA style sheet or other
 requirements related to an element that have to be implemented.

The difference between the parsing algorithm on the UA stylesheet is
that authors can put display: block; in the author stylesheet during
the transition.

That said, the example jgraham showed to me on IRC convinced me that
if main is introduced to the platform, it makes sense to make it
parse like article. :-( (I’m not a fan of the consequences of the
“feature” of making /p optional. Too bad that feature is ancient and
it’s too late on undo it.)

I guess I’ll focus on objecting to new void elements and especially to
new children of head.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] maincontent element spec updated and supporting data provided

2012-10-18 Thread Henri Sivonen

On Wed, Oct 17, 2012 at 3:03 AM, Steve Faulkner
faulkner.st...@gmail.com wrote:
 I have updated the maincontent spec [1] and would appreciate any feedback
 (including, but not limited to implementers).

bikeshedA single-word element name would me more consistent with
other HTML element names. content would be rather generic, so I
think main would be the better option./bikeshed

It would probably make sense to add
main { display: block; }
to the UA stylesheet.

If Hixie had added this element in the same batch as section,
article and aside, he would have made the parsing algorithm
similarly sensitive to this element. However, I'm inclined to advise
against changes to the parsing algorithm at this stage (you have none;
I am mainly writing this for Hixie), since it would move us further
from a stable state for the parsing algorithm and, if the main
element is used in a conforming way, it won't have a p element
preceding it anyway.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Archive API - proposal

2012-08-15 Thread Henri Sivonen

On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard gl...@zewt.org wrote:
 On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini b...@mozilla.com wrote:

 // The getFilenames handler receives a list of DOMString:
 var handle = this.reader.getFile(this.result[i]);

 This interface is problematic.  Since ZIP files don't have a standard
 encoding, filenames in ZIPs are often garbage.  This API requires that
 filenames round-trip uniquely, or else files aren't accessible t all.

Indeed, in the case of zip files, file names themselves are dangerous
as handles that get past passed back and forth, so it seems like a
good idea to be able to extract the contents of a file inside the
archive without having to address the file by name.

As for the filenames, after an off-list discussion, I think the best
solution is that UTF-8 is tried first but the ArchiveReader
constructor takes an optional second argument that names a character
encoding from the Encoding Standard. This will be known as the
fallback encoding. If no fallback encoding is provided by the caller
of the constructor, Windows-1252 is set as the fallback encoding.
When it ArchiveReader processes a filename from the zip archive, it
first tests if the byte string is a valid UTF-8 string. If it is, the
byte string is interpreted as UTF-8 when converting to UTF-16. If the
filename is not a valid UTF-8 string, it is decoded into UTF-16 using
the fallback encoding.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] [selectors4] drag-and-drop pseudo-classes

2012-08-15 Thread Henri Sivonen

On Aug 14, 2012 10:54 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:

 On Tue, Aug 14, 2012 at 12:13 PM, Ryosuke Niwa rn...@webkit.org wrote:
  Yeah, and that's not compatible with how drag and drop are implemented
on
  the Web.

 I know.  You'll notice that I didn't suggest we somehow change to
 that.  ^_^  However, other languages might want this kind of model,

Other languages?

Re: [whatwg] Was is considered to use JSON-LD instead of creating application/microdata+json?

2012-08-14 Thread Henri Sivonen

On Fri, Aug 10, 2012 at 1:39 PM, Markus Lanthaler
markus.lantha...@gmx.net wrote:
  Well, I would say there are several advantages. First of all, JSON-LD
 is
  more flexible and expressive.

 More flexible and expressive than what?

 Than application/microdata+json.

That's a problem right there. It means that JSON-LD requires more
consumer complexity than application/microdata+json.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Features for responsive Web design

2012-08-13 Thread Henri Sivonen

On Fri, Aug 10, 2012 at 11:54 AM, Florian Rivoal flori...@opera.com wrote:
 I wasn't debating whether or not shipping a device with a 1.5 pixel
 ratio is the best decision, but answering: Is there a good reason
 to believe that will be something other than a power of two?

 The fact that it has happened seems a pretty good reason to believe
 that it may happen.

These are different questions:
Will someone ship a browser/device combination whose device pixel
ratio is something other than 1 or 2?
Will Web authors bother to supply bitmaps with sampling factors other
than 1 and 2?

As a data point worth considering, for desktop apps on OS X Apple
makes developers supply bitmap assets for 1x and 2x and if the user
chooses a ratio between 1 and 2, the screen is painted at 2x and the
resulting bitmap is scaled down.

Another thing worth considering is if ever anyone is really going to
go over 2x, given that at normal viewing distances 2x is roughly
enough to saturate the resolution of the human eye (hence the retina
branding). Even for printing photos, 192 pixels per inch should result
in very good quality, and for line art, authors should use SVG instead
of bitmaps anyway.

If it indeed is the case that there are really only two realistic
bitmaps samplings for catering to differences in weeding device pixel
density (ignoring art direction), it would make sense to have simply
img src=1xsampling.jpg hisrc=2xsampling.jpg alt=Text
alternative instead of an in-attribute microsyntax for the
non-art-directed case.

Ian Hickson wrote:
On Wed, 16 May 2012, Henri Sivonen wrote:

 It seems to me that Media Queries are appropriate for the art-direction
 case and factors of the pixel dimensions of the image referred to by
 src= are appropriate for the pixel density case.

 I'm not convinced that it's a good idea to solve these two axes in the
 same syntax or solution. It seems to me that srcset= is bad for the
 art-direction case and picture is bad for the pixel density case.

 I don't really understand why

They are conceptually very different: One is mere mipmapping and can
be automatically generated. The other involves designer judgment and
is conceptually similar to CSS design where authors use MQ. Also,
having w and h refer to the browsing environment and x to the image in
the same microsyntax continues to be highly confusing.

Ignoring implementation issues for a moment, I think it would be
conceptually easier it to disentangle these axes like this:

Non-art directed:
img src=1xsampling.jpg hisrc=2xsampling.jpg alt=Text alternative

Art directed:
picture
source src=1xsampling-cropped.jpg hisrc=2xsampling-cropped.jpg
media=max-width: 480px
img src=1xsampling.jpg hisrc=2xsampling.jpg alt=Text alternative
/picture

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] register*Handler and Web Intents

2012-08-06 Thread Henri Sivonen

On Fri, Aug 3, 2012 at 12:00 PM, James Graham jgra...@opera.com wrote:
 I agree with Henri that it is
 extremely worrying to allow aesthetic concerns to trump backward
 compatibility here.

Letting aesthetic concerns trump backward compat is indeed troubling.
It's also troubling that this even needs to be debated, considering
that we're supposed to have a common understanding of the design
principles and the design principles pretty clearly uphold backward
compatibility over aesthetics.

 I would also advise strongly against using position in DOM to detect intents
 support; if you insist on adding a new void element I will strongly
 recommend that we add it to the parser asap to try and mitigate the above
 breakage, irrespective of whether our plans for the rest of the intent
 mechanism.

I think the compat story for new void elements is so bad that we
shouldn't add new void elements. (source gets away with being a void
element, because the damage is limited by the /video or /audio end
tag that comes soon enough after source.) I think we also shouldn't
add new elements that don't imply body when appearing in in head.

It's great that browsers have converged on the parsing algorithm.
Let's not break what we've achieved to cater to aesthetics.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] alt= and the meta name=generator exception

2012-08-05 Thread Henri Sivonen

On Wed, Aug 1, 2012 at 10:56 AM, Ian Hickson i...@hixie.ch wrote:
 After all, what's the point of using validation if you use a generator?

People who are not the developer of the generator use validators to
assess the quality of the markup generated by the generator.

 You would in effect be testing the generator, something that its vendor
 should have done. We should not be concerned about helping generator
 vendors to advertize their products as producing valid code (code that
 passes validation) when they in fact produce code that violates
 established good practices of HTML.

Alice writes a generator that's logically cannot know the text
alternative for an image file and, therefore, makes the generator
output img without alt. Bob is shopping around for generators of the
type Alice's generator happens to be or engaging in an Internet
argument about which generator sucks and which generator rocks. So Bob
feeds the output of Alice's generator to validator, sees an error
message and proceeds to proclaim to the world that Alice's generator
is bad, because it's output doesn't validate. Alice doesn't want Bob
to proclaim to the world that her generator is bad. Educating Bob and
everyone who listens to Bob about why the generator produces output
that causes the validation error is too hard. The path of least
resistance for Alice to make the problem go away is to change the
output of the generator such that it doesn't result in an error
message from a validator, so Alice makes the image have the attribute
alt= which happens to result in the existence of the image being
concealed from users of screen readers.

Or, alternatively, Alice anticipates Bob's reaction and preemptively
makes her generator output alt= before Bob ever gets to badmouth
about the invalidity of the generator's output.

Even if we wanted to position validators as tools for the people who
write markup, we can't prevent other people from using validators to
judge markup output by generator written by others. The crux of this
problem is the tension between a validator as a tool for the person
writing the markup and a validator being used to judge someone else's
markup and what people can most easily do to evade such judgment.

 We briefly brainstormed some ideas on #whatwg earlier tonight, and one
 name in particular that I think could work is the absurdly long

img src=... generator-unable-to-provide-required-alt=

 This has several key characteristics that I think are good:

  - it's long, so people aren't going to want to type it out
  - it's long, so it will stick out in copy-and-paste scenarios
  - it's emminently searchable (long unique term) and so will likely lead
to good documentation if it's adopted
  - the generator part implies that it's for use by generators, and may
discourage authors from using it
  - the unable and required parts make it obvious that using this
attribute is an act of last resort

While I agree with the sentiment the name of the attribute
communicates, its length is enough of a problem to probably make it
fail:
1) Like a namespace URL, it's too long to memorize correctly, so it's
easier for the generator developer to type 'alt' than to copy and past
the long attribute name from somewhere.
2) It takes so many more bytes than alt=, so it's easy to shy away
from using it on imagined efficiency grounds.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] alt= and the meta name=generator exception

2012-08-05 Thread Henri Sivonen

On Sat, Aug 4, 2012 at 9:08 AM, Michael[tm] Smith m...@w3.org wrote:
 Agreed. I support making having some kind of trial period like what you
 describe, or a year or two or 18 months. If we do that I would prefer that
 the spec include some kind of note/warning making it clear that the
 attribute is experimental and may be dropped or changed significantly
 within the next two years based on analysis we get back during that time.

There's a non-trivial set of validator users who get very upset if the
validator says that the document that previously produced no
validation errors now produces validation errors--even if the new
errors result from a bug fix. In my experience, handing out badges
makes people more upset if the criteria behind the badge changes, but
even without badges, it seems to me that the sentiment is there.

Therefore, if you tell people that if they use a particular syntax
their document might become invalid in the future, chances are that
they will steer clear of the syntax when an easier alternative is
available--just writing alt=. So adding a warning that the syntax is
experimental is an almost certain way to affect the outcome of the
experiment. On the other hand, not warning people and then changing
what's valid is likely to make people unhappy.

It seems to me that running an experiment like this will either result
in a failed experiment, unhappy people or both.

If an experiment on this topic was to be run, what would you measure
how would you interpret the measurements?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Linters should complain about missing accessible names for controls [Was: Re: alt= and the meta name=generator exception]

2012-08-05 Thread Henri Sivonen

On Sat, Aug 4, 2012 at 10:32 PM, Benjamin Hawkes-Lewis
bhawkesle...@googlemail.com wrote:
 Would it be possible to combine this with the linter complaining about
 all controls (links, buttons, form fields) have markup that yield a
 non-empty accessible name without invoking repair techniques such as
 reading filenames without img @src attributes?

Given a well-defined algorithm for finding the accessible name for
links, buttons and form fields, I think it would make sense for a
validator to be able to complain when the algorithm results in an
empty accessible name. Whether that should be a validity constraint or
an optional additional check is a bit tricky, for the same reason why
we allow empty paragraphs and empty lists: to let markup editors
simultaneously guarantee the validity of their output and to allow the
user to save the document at any stage of editing.

(Again, there's tension between different uses of validity: the sort
of validity constraints you want to hold before and after each
discrete editing operation and constraints you want to hold when the
document is done.)

 http://www.w3.org/WAI.new/PF/aria/roles#namecalculation

Spec writing that puts a point starting with Authors MAY under The
text alternative for a given node is computed as follows: is
sad-making. :-(

 I realise the author requirements in the HTML spec seem to have
 gradually become very forgiving here, not really sure why. :(

To avoid e.g. the insertion of an nbsp; in each newly-created p/p
in an editor to avoid violating a ban on empty paragraphs. Validity
constraints have unintended consequences.

 The cases where markup generators cannot provide a better control name
 than _nothing_ seem to me much rarer than the cases where markup
 generators cannot provide better text alternatives for photos etc -
 maybe even non-existent - and when hand-authoring describing a control
 is even easier than coming up with a text equivalent for a graphic.

Yeah. In this case, the problem isn't non-interactive generators but
interactive editors.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Load events fired during onload handlers

2012-08-02 Thread Henri Sivonen

For what it's worth, I think the weirdness described in this thread is
a good reason not to try to make DOMContentLoaded consistent with the
load event for the sake of consistency. For one thing, the code that
manages the weirdness of the load event lives in a different place
compared to the code that fires DOMContentLoaded.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] register*Handler and Web Intents

2012-08-02 Thread Henri Sivonen

On Thu, Jul 26, 2012 at 5:20 AM, Ian Hickson i...@hixie.ch wrote:
 Thus, I propose a parallel mechanism in the form of an empty
 element that goes in the head:

   intent
 action=edit intent action, e.g. open or edit, default share
 type=image/png  MIME type filter, default omitted, required if scheme 
 omitted
 scheme=mailto   Scheme filter, default omitted, required if type omitted
 href=   Handler URL, default  (current page)
 title=Foo   Handler user-visible name, required attribute
 disposition=HandlerDisposition values, default overlay
   

This is a severe violation of the Degrade Gracefully design principle.
Adopting your proposal would mean that pages that include the intent
element in head would parse significantly differently in browsers that
predate the HTML parsing algorithm or in browsers that implement it in
its current form. I believe that having the intent element break the
parser out of head in browsers that don't contain the parser
differences you implicitly propose would cause a lot of grief to Web
authors and would hinder the adoption of this feature.

My concerns could be addressed in any of these three ways:
1) Rename intent to link
2) Rename intent to meta
3) Make intent have an end tag and make it placed in body rather than head

I prefer solution #1.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] alt and title attribute exception

2012-08-01 Thread Henri Sivonen

On Tue, Jul 31, 2012 at 12:18 PM, Philip Jägenstedt phil...@opera.com wrote:
 When this was last discussed in the HTML WG (January 2012) I opened a bug
 (MOBILE-275) for Opera Mobile to expose the title attribute in our
 long-click menu, arguing that one could not enjoy XKCD without it. I meant
 to report back to the HTML WG but forgot, so here it is. Unfortunately, the
 bug was rejected... quoting the project management:

 Sure it is nice to have, but noone else has it so we will not put our
 effort into this

Firefox for Android (at least on the Nightly channel) displays the
content of the title attribute on XKCD comics (up to a length limit
which can often be too limiting) upon tap and hold:
http://hsivonen.iki.fi/screen/xkcd-firefox-for-android.png

Not to suggest that XKCD's title usage is OK but just to correct the
noone else bit.

 it seems unwise to recommend using the title attribute to convey important 
 information.

Indeed. In addition to image considerations, I think
http://www.whatwg.org/specs/web-apps/current-work/#footnotes is bad
advice.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] alt= and the meta name=generator exception

2012-07-25 Thread Henri Sivonen

On Tue, Jul 24, 2012 at 10:58 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:
 This is an improvement, but I think Edward O'Connor's points still apply.

Indeed. The spec edit is a rather disappointing response.

 I think it would be better to keep the alt attribute always required but
 recommend that conformance checkers have an option of switching off errors
 related to this

The big question is whether that would be enough to solve the problem
of generators generating bogus alts in order to pass validation. I
predict generator writers would want the generator output to pass
validation with the default settings and, therefore, what you suggest
wouldn't fix the problem that the spec is trying to fix.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Proposal for readyState behavior

2012-07-16 Thread Henri Sivonen

On Tue, Jul 10, 2012 at 10:15 PM, Ian Hickson i...@hixie.ch wrote:
 Done.

Thanks.

  4) Whenever a transition to interactive is made, DOMContentLoaded
 must eventually get fired later if the document stays in a state where
 events can fire on it.
  Rationale:
* This seems sensible for consistency with the common case.
 Currently, there are cases where Firefox fires DOMContentLoaded
 without a transition to interactive or transitions to interactive
 without ever firing DOMContentLoaded, but these cases are inconsistent
 with other browsers, so it's hard to believe they are well-considered
 compatibility features.
 Delta from the spec: Same as for point 3.

 Disagreed. IMHO DOMContentLoaded is equivalent to 'load', just a bit
 earlier (it's basically 'load' but before the scripts have run). In fact,
 I'd specifically define DOMContentLoaded as meaning the DOM content was
 completely loaded, which clearly can't happen if the parser aborted.

Could you please leave your sense of logic at the door instead of
rocking the interop boat like this? Personally, I'm already spending
way more than enough time in this quagmire of trying to sort out
events and readyStates with abnormal document loads that I have about
zero interest in making Gecko not fire an event in a situation where
Firefox, IE10 and Opera currently fire it. Furthermore, I think that
in a situation like this change is more harmful and likely to break
something than the sort of logic you offered is useful.

 10) XSLT error pages don't count as aborts but instead as non-aborted
 loads of the error page.
  Rationale:
* Makes parent pages less confused about events they are waiting.
* Already true except for bugs in Firefox which is the only
 browser with XSLT error pages.
 Delta from the spec: Make explicit in spec.

 I haven't defined this because to define this I'd have to define a ton of
 infrastructure that explains how XSLT works in the first place, and I'm
 still waiting for the XSLT community to write the tests that demonstrate
 what the requirements should be:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=14689

I don't think you need to spec infrastructure to define a high-level
expectation that loads with XSLT errors are supposed to finish as if
they were successful loads rather than aborted loads.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Readiness of script-created documents

2012-06-15 Thread Henri Sivonen

On Tue, Jun 12, 2012 at 1:46 AM, Ian Hickson i...@hixie.ch wrote:
 When a document is aborted the state is more or less left exactly as it
 was when it was aborted. This includes the readiness state. It also means
 no events fire (e.g. no 'load', 'unload', or 'error' events), a number of
 scripts just get abandoned without executing, appcache stuff gets
 abandoned, queued calls to window.print() get forgotten, etc.

 Aborting a document is a very heavy-handed measure. Documents are not
 expected to last long after they have been aborted, typically. Pages
 aren't expected to remain functional beyond that point.

That's not reality in all browsers right now, and I think it doesn't
make sense to make that the reality. That is, there already browsers
that transition readyState to complete upon aborting the parser and
I think doing that makes sense (and I want to change Gecko to do that,
too), because a non-complete readyState is a promise to fire an
load event later.

I think it's a bad idea to leave a document into the loading state
when the browser engine knows that it won't fire and load event for
the document.

Basically, I think the platform should maximize the chances of the
following code pattern causing doStuff to run once the document has
completely loaded:
if (document.readyState == complete) {
  setTimeout(doStuff, 0);
} else {
  document.addEventListener(load, doStuff);
}

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] HTMLLinkElement.disabled and HTMLLinkElement.sheet behavior

2012-06-07 Thread Henri Sivonen

On Thu, Jun 7, 2012 at 2:47 AM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 27 Jan 2012, Boris Zbarsky wrote:
 On 1/27/12 1:30 AM, Ian Hickson wrote:
  On Wed, 5 Oct 2011, Henri Sivonen wrote:
   On Tue, Oct 4, 2011 at 9:54 PM, Boris Zbarskybzbar...@mit.edu  wrote:
What Firefox does do is block execution ofscript  tags (but not
timeouts, callbacks, etc!) if there are pending non-altenate
parser-inserted stylesheet loads.  This is necessary to make sure
that scripts getting layout properties see the effect of those
stylesheets. A side-effect is that ascript  coming after alink
will never see the link in an unloaded state... unless there's a
network error for thelink  or whatever.
  
   One exception: If an inline script comes from document.write(), it
   doesn't block on pending sheets. It runs right away. If it blocked
   on pending sheets, the point at which document.write() returns would
   depend on network performance, which I think would be worse than
   having document.written inline scripts that poke at styles fail
   depending on network performance.
 
  Note that this is not conforming. The spec does not currently define
  any such behaviour.

 Which part is not conforming?  The exception for alternate sheets, the
 inline script inside document.write thing, or something else?

 Unless I'm mistaken, nothing in the HTML spec does anything differently
 based on whether a script comes from document.write() or not.

I think that's a spec bug per one exception above.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Various HTML element feedback

2012-06-06 Thread Henri Sivonen

On Wed, Jun 6, 2012 at 2:53 AM, Ian Hickson i...@hixie.ch wrote:
 That might be realistic, especially there is no significant semantic
 clarification in sight in general. This raises the question why we could
 not just return to the original design with some physical markup like
 i, b, and u together with span that was added later.

 I think you'll find the original design of HTML isn't what you think it
 is (or at least, it's certainly not as presentational as you imply above),
 but that's neither here nor there.

Is there a record of design between
http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html
and
http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
?
 So why not simply define i recommended and describe var, cite,
 em, and dfn as deprecated but supported alternatives?

 What benefit does empty deprecation have? It's not like we can ever remove
 these elements altogether. What harm do they cause?

The harm is the wasted time spent worrying about and debating which
semantic alternative for italics to use.

 If we have to keep them, we are better served by embracing them and giving
 them renewed purpose and vigour, rather than being ashamed of them.

I think we have to keep them, because trying to declare them invalid
would cause people to do a lot of pointless work, too, but I think we
could still be ashamed of them.

 Note that as it is specified, div can be used instead of p with
 basically no loss of semantics. (This is because the spec defines
 paragraph in a way that doesn't depend on p.)

Is there any known example of a piece of software that needs to care
about the concept of paragraph and uses the rules given in the spec
for determining what constituted paragraphs?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Media queries, viewport dimensions, srcset and picture

2012-06-06 Thread Henri Sivonen

On Wed, May 23, 2012 at 6:21 PM, Florian Rivoal flori...@opera.com wrote:
 On the other hand, I think that including 600w 400h in there is misguided.

I agree.

 1) simplyfy srcset to only accept the *x qualifier

Is there a good reason to believe that * will be something other than
a power of two?

That is, could we just optimize the *x syntax away and specify that
the first option is 1x, the second is 2x, the third is 4x, etc.?

 I believe the only way out is through an image format that:
...
 - is designed so that the browser can stop downloading half way through
 the file, if it determines it got sufficiently high resolution given the
 environment

More to the point, the important characteristic is being able to stop
downloading *quarter* way through the file and get results that are as
good as if the full-size file had been down sampled with both
dimensions halved and that size had been sent as the full file. (I am
not aware of a bitmap format suitable for photographs that has this
characteristic. I am aware that  JPEG 2000 does not have this
characteristic. I believe interlaced PNGs have that characteristic,
but they aren't suitable for photographs, due to the lossless
compression.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Bandwidth media queries

2012-05-21 Thread Henri Sivonen

On Wed, May 16, 2012 at 9:48 PM, Matthew Wilcox m...@matthewwilcox.com wrote:
 If you're a browser you are the software interpreting the instructions
 of a given language: CSS in this case.

In addition to the problem that it's actually hard for browsers to
know what the current bandwidth is especially on mobile networks, some
of this responsive design threads assume that the author knows best
when to withdraw content or content quality due to low bandwidth.

From the user perspective, I think it's not at all clear that users
always prefer to get less content when they are on a slower
connection. Personally, I expect to see full content on a slow
connection if I wait for long enough, but it's also annoying to have
to wait for the whole page to load before the page is usable. The
problem is that sometimes waiting is worth it and sometimes it isn't
and the author might not know when the user considers the wait to be
worth it.

Unfortunately, the way the load event works makes it hard to make
pages so that they start working before images are fully loaded and
then keep improving in image quality if the user chooses to wait.
Also, some browsers intentionally limit their own ability to do
incremental rendering both to get better throughput and to get better
perceptual speed in cases where the overall page load is relatively
fast.

On a very slow networks (GPRS or airline Wi-Fi) I think Opera Mini
with *full* image quality provides the best experience: the page
renders with its final layout and becomes interactive with images
replaced with large areas of color that represents the average color
occupying that area in the images. The images then become sharper over
time. Thus, the user has the option to start interacting with the page
right away if the user deems the image is not worth the wait or can
choose to wait if the user expects the images to contain something
important. (This assumes, of course, that the user is not paying per
byte even though the connection is slow, so that it's harmless from
the user perspective to start loading data that the user might dismiss
by navigating away from the page without waiting for the images to
load in full.)

Instead of giving web authors the tools to micro-manage what images
get shown in what quality under various bandwidth conditions, I think
it would be better to enable a load mode in traditional-architecture
(that is, not the Opera Mini thin client architecture) browsers that
would allow early layout and load event and progressive image quality
enhancement after the load event is fired and the page has its final
layout (in the sense of box dimensions). I.e. have a mode where the
load event fires as soon as the everything except images have loaded
and the dimensions of all image boxes are known to the CSS formatter
(and PNG and JPEG progression is used 1990s style after the load even
has fired).

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Features for responsive Web design

2012-05-16 Thread Henri Sivonen

On Wed, May 16, 2012 at 2:46 PM, Jeremy Keith jer...@adactio.com wrote:
 You're right. I was thinking that the values (Nh Nw Nx) described the *image* 
 but in fact they describe (in the case of Nh and Nw) the viewport and (in the 
 case of Nx) the pixel density of the screen/device.

 I suspect I won't be the only one to make that mistake.

Indeed. I made the same mistake initially. The what's currently in the
spec is terribly counter-intuitive in this regard.

 I can see now how it does handle the art-direction case as well. I think it's 
 a shame that it's a different syntax to media queries but on the plus side, 
 if it maps directly to imgset in CSS, that's good.

It seems to me that Media Queries are appropriate for the
art-direction case and factors of the pixel dimensions of the image
referred to by src= are appropriate for the pixel density case.

I'm not convinced that it's a good idea to solve these two axes in the
same syntax or solution. It seems to me that srcset= is bad for the
art-direction case and picture is bad for the pixel density case.

(I think the concept of dpi isn't appropriate for either case, FWIW. I
think the number of horizontal and vertical bitmap samples doubled
relative to the traditional src image works much better conceptually
for Web authoring than making people do dpi math with an abstract
baseline of 96 dpi. Anecdotal observation of trying to get family
members to do dpi math for print publications suggests that it's hard
to get educated people do dpi math right even when an inch is a real
inch an not an abstraction.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] IBM864 mapping of Encoding Standard

2012-04-24 Thread Henri Sivonen

On Tue, Apr 24, 2012 at 6:31 AM, Makoto Kato m_k...@ga2.so-net.ne.jp wrote:
 (2012/04/20 17:09), Anne van Kesteren wrote:
 Does that mean you want to remove the encoding from Gecko? That would
 work for me. It is currently not supported by Opera either.
 Alternatively mapping 0xA7 to U+20AC works for me too, but I don't want
 it to tinker with the ASCII range.


 Except to OS/2 and AIX, I think that this encoding is unnecessary since most
 browsers aren't supported.

Does the OS/2 port need it for interfacing with the system APIs? If
the OS/2 port needs it for interfacing with the system APIs, can we
stop exposing the encoding to the Web and can we stop building the
IBM864 encoder/decoder on non-OS/2 platforms? I think it's a bad idea
to vary the supported set of Web-exposed encodings by operating
system.

IIRC, some old Mac encodings that are still relevant for dealing with
legacy fonts were hidden from Web content and UTF-7 was made
mail-only.

If OS/2 doesn't need it for system APIs, can we just remove the IBM864
support altogether.

Is the AIX port still relevant? I thought 3.6 was the last version
ported to AIX.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Encoding Sniffing

2012-04-23 Thread Henri Sivonen

On Sat, Apr 21, 2012 at 1:21 PM, Anne van Kesteren ann...@opera.com wrote:
 This morning I looked into what it would take to define Encoding Sniffing.
 http://wiki.whatwg.org/wiki/Encoding#Sniffing has links as to what I looked
 at (minus Opera internal). As far as I can tell Gecko has the most
 comprehensive approach and should not be too hard to define (though writing
 it all out correctly and clear will be some work).

The Gecko notes aren't quite right:
 * The detector chosen from the UI is used for HTML and plain text
when loading those in a browsing context from HTTP GET or from a
non-http URL. (Not used for POST responses. Not used for XHR.)
 * The default for the UI setting depends on the locale. Most locales
default to know detector at all. Only zh-TW defaults to the Universal
detector. (I'm not sure why, but I think this is a bug of *some* kind.
Perhaps the localizer wanted to detect both Traditional and Simplified
Chinese encodings and we don't have a detector configuration for
TraditionalSimplified.) Other locales that default to having a
detector enabled default to a locale-specific detector (e.g. Japanese
or Ukranian).
 * The Universal detector is used regardless of UI setting or locale
when using the FileReader to read a local file as text. (I'm
personally very unhappy about this sort of use of heuristics in a new
feature.)
 * The Universal detector isn't really universal. In particular, it
misdetects Central European encodings like ISO-8859-2. (I'm personally
unhappy that we expose the Universal detector in the UI and thereby
bait people to enable it.)
 * Regardless of detector setting, when loading HTML or plain text in
a browsing context, Basic Latin encoded as UTF-16BE or UTF-16LE is
detected. This detection is not performed by FileReader.

 I have some questions though:

 1) Is this something we want to define and eventually implement the same
 way?

I think yes in principle. In practice, it might be hard to get this
done. E.g. in the case of Gecko, we'd need someone who has no higher
priority work than rewriting chardet in compliance with the
hypothetical spec.

I don't want to enable heuristic detection for all HTML page loads.
Yet, it seems that we can't get rid of it for e.g. the Japanese
context. (It's so sad that the situation is the worst in places that
have multiple encodings and, therefore, logically should be more aware
of the need to declare which one is in use. Sigh.) I think it is bad
that the Web-exposed behavior of the browser depends on the UI locale
of the browser. I think it would be worthwhile research project to
find out if that were feasible to trigger language-specific heuristic
detection on a per TLD basis instead on a per UI locale basis (e.g.
enabling the Japanese detector for all pages loaded from .jp and the
Russian detector for all pages loaded from .ru regardless of UI locale
and requiring .com Japanese or Russian sites to get their charset act
together or maybe having a short list of popular special cases that
don't use a country TLD but don't declare the encoding, either).

 2) Does this need to apply outside HTML? For JavaScript it forbidden per the
 HTML standard at the moment. CSS and XML do not allow it either. Is it used
 for decoding text/plain at the moment?

Detection is used for text/plain in Gecko when it would be used for text/html.

I think detection shouldn't be used for anything except plain text and
HTML being loaded into browsing context considering that we've managed
this far without it (well, except for FileReader).  (Note that when
not declaring the encoding on their own JavaScript and CSS inherit the
encoding of the HTML document that references them.)

 3) Is there a limit to how many bytes we should look at?

In Gecko, the Basic Latin encoded as UTF-16BE or UTF-16LE check is run
on the first 1024 bytes.  For the other heuristic detections, there is
no limit and changing the encoding potentially causes renavigation to
the page.  During the Firefox for development cycle, there was a limit
of 1024 bytes (no renavigation!), but it was removed in order to
support the Japanese Planet Debian (site fixed since then) and other
unspecified but rumored Japanese sites.

On Sun, Apr 22, 2012 at 2:11 AM, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:
 We've had some discussion on the usefulness of this in WebVTT - mostly
 just in relation with HTML, though I am sure that stand-alone video
 players that decode WebVTT would find it useful, too.

WebVTT is a new format with no legacy. Instead of letting it become
infected with heuristic detection, we should go the other direction
and hardwire it as UTF-8 like we did with app cache manifests and
JSON-in-XHR.  No one should be creating new content in encodings other
than UTF-8. Those who can't be bothered to use The Encoding deserve
REPLACEMENT CHARACTERs. Heuristic detection is for unlabeled legacy
content.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Readiness of script-created documents

2012-04-23 Thread Henri Sivonen

On Mon, Jun 20, 2011 at 3:10 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Mon, Jun 20, 2011 at 4:26 AM, Henri Sivonen hsivo...@iki.fi wrote:
 http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1039

 It says complete in Firefox, loading in Chrome and Opera and
 uninitialized in IE. The spec requires complete. readyState is
 originally an IE API. Why doesn't the spec require uninitialized?

 (The implementation in Gecko is so recent that it's quite possible that
 Gecko followed the spec and the spec just made stuff up as opposed to
 the spec following Gecko.)

 complete seems like the most useful and consistent value which would
 seem like a good reason to require that.

Why don't aborted documents reach complete in Gecko? It seems weird
to have aborted documents stay in  the loading state when they are
not, in fact, loading.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] Proposal for readyState behavior

2012-04-23 Thread Henri Sivonen

IE can omit interactive:
http://hsivonen.iki.fi/test/moz/readystate/document-open.html
load can be synchronous in Chrome and IE:
http://hsivonen.iki.fi/test/moz/readystate/document-open.html
Firefox forgets DOMContentLoaded for XSLT:
http://hsivonen.iki.fi/test/moz/readystate/xslt.html
Firefox skips interactive but not DOMContentLoaded when aborting:
http://hsivonen.iki.fi/test/moz/readystate/window-stop.html
Documents aborted by window.location reach complete in Opera:
http://hsivonen.iki.fi/test/moz/readystate/window-location.html
Defer scripts are executed at the wrong time in Firefox:
http://hsivonen.iki.fi/test/moz/readystate/defer-script.html

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] DOMContentLoaded, load and current document readiness

2012-04-20 Thread Henri Sivonen

On Tue, Jan 10, 2012 at 2:10 AM, Ian Hickson i...@hixie.ch wrote:
 On Tue, 31 May 2011, Henri Sivonen wrote:

 Recently, there was discussion about changing media element state in the
 same task that fires the event about the state change so that scripts
 that probe the state can make non-racy conclusions about whether a
 certain event has fired already.

 Currently, there seems to be no correct non-racy way to write code that
 probes a document to determine if DOMContentLoaded or load has fired and
 runs code immediately if the event of interest has fired or adds a
 listener to wait for the event if the event hasn't fired.

 Are there compat or other reasons why we couldn't or shouldn't make it
 so that the same task that fires DOMContentLoaded changes the readyState
 to interactive and the same task that fires load changes readyState to
 complete?

 Fixed for 'load'. I don't see a good way to fix this for
 'DOMContentLoaded', unfortunately.

It turns out that Firefox has accidentally been running defer scripts
after DOMContentLoaded. I haven't seen bug reports about this.
Embracing this bug might offer a way to always keep the
readystatechange to interactive in the same task that fire
DOMContentLoaded.

See http://hsivonen.iki.fi/test/moz/readystate/defer-script.html

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] readyState transitions when aborting a document

2012-04-20 Thread Henri Sivonen

On Thu, Apr 19, 2012 at 2:43 PM, Henri Sivonen hsivo...@iki.fi wrote:
  * Is there a way to abort a document load in IE without causing
 immediate navigation away from the document? IE doesn't support
 window.stop().

Yes. document.execCommand(Stop)

  * Does Web compatibility ever require a transition from loading to
 complete without an intermediate interactive state?  (Both chrome
 and Firefox as shipped make such transitions, but those might be
 bugs.)

I have no evidence to say anything sure here, but I doubt Web compat
requires transitions from loading to complete. What actually
happens varies a lot.

  * Should the aborted documents stay in the loading state forever
 like the spec says or should they reach the complete state
 eventually when the event loop spins?

Gecko and WebKit disagree.

  * Should window.stop() really not abort the parser like the spec
 seems to suggest?

Looks like Opera is alone with the non-aborting behavior. The spec is wrong.

  * Should reaching complete always involve firing load?

Not in WebKit.

  * Should reaching interactive always involve firing DOMContentLoaded?

Probably.

  * Does anyone have test cases for this stuff?

Demos: http://hsivonen.iki.fi/test/moz/readystate/

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] readyState transitions when aborting a document

2012-04-19 Thread Henri Sivonen

I've been trying to make document.readyState transitions less broken
in Gecko. (The transitions are very sad as of Firefox 13 in pretty
much all but the most trivial cases.)

I'm having a particularly hard time figuring out what the right thing
to do is when it comes to aborting document loads. Unfortunately, I
don't trust the spec to describe the Web-compatible truth.
 * Is there a way to abort a document load in IE without causing
immediate navigation away from the document? IE doesn't support
window.stop().
 * Does Web compatibility ever require a transition from loading to
complete without an intermediate interactive state?  (Both chrome
and Firefox as shipped make such transitions, but those might be
bugs.)
 * Should the aborted documents stay in the loading state forever
like the spec says or should they reach the complete state
eventually when the event loop spins?
 * Should window.stop() really not abort the parser like the spec
seems to suggest?
 * Should reaching complete always involve firing load?
 * Should reaching interactive always involve firing DOMContentLoaded?
 * Does anyone have test cases for this stuff?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2012-04-04 Thread Henri Sivonen

On Tue, Apr 3, 2012 at 10:08 PM, Anne van Kesteren ann...@opera.com wrote:
 I didn't mean a prescan.  I meant proceeding with the real parse and
 switching decoders in midstream. This would have the complication of
 also having to change the encoding the document object reports to
 JavaScript in some cases.

 On IRC (#whatwg) zcorpan pointed out this would break URLs where entities
 are used to encode non-ASCII code points in the query component.

Good point.  So it's not worthwhile to add magic here.  It's better
that authors declare that they are using UTF-8.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Readiness of script-created documents

2012-04-03 Thread Henri Sivonen

On Mon, Apr 2, 2012 at 11:29 AM, Jonas Sicking jo...@sicking.cc wrote:
 Everyone returning the same thing isn't the only goal. First of all
 what's the purpose of all browsers doing the same thing if that same
 thing isn't useful?

No one is worse off and stuff works even if an author somewhere relies
on a crazy edge case behavior.

 Second, you are assuming that people are actually
 aware of this edge case and account for it. Here it seems just as
 likely to me that generic code paths would result in buggy pages given
 IEs behavior, and correct behavior given the specs behavior. Third, if
 no-one is hitting this edge case, which also seems quite plausible
 here, then it having a while longer without interoperability won't
 really matter what we do and doing the most useful thing seems like
 the best long-term goal.

On the other hand, for cases no one is hitting, it's probably not
worthwhile to spend time trying to get the behavior to change from
what was initially introduced.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2012-04-03 Thread Henri Sivonen

On Wed, Jan 4, 2012 at 12:34 AM, Leif Halvard Silli
xn--mlform-...@xn--mlform-iua.no wrote:
 I mean the performance impact of reloading the page or,
 alternatively, the loss of incremental rendering.)

 A solution that would border on reasonable would be decoding as
 US-ASCII up to the first non-ASCII byte

 Thus possibly prescan of more than 1024 bytes?

I didn't mean a prescan.  I meant proceeding with the real parse and
switching decoders in midstream. This would have the complication of
also having to change the encoding the document object reports to
JavaScript in some cases.

 and then deciding between
 UTF-8 and the locale-specific legacy encoding by examining the first
 non-ASCII byte and up to 3 bytes after it to see if they form a valid
 UTF-8 byte sequence.

 Except for the specifics, that sounds like more or less the idea I
 tried to state. May be it could be made into a bug in Mozilla?

It's not clear that this is actually worth implementing or spending
time on its this stage.

 However, there is one thing that should be added: The parser should
 default to UTF-8 even if it does not detect any UTF-8-ish non-ASCII.

That would break form submissions.

 But trying to gain more statistical confidence
 about UTF-8ness than that would be bad for performance (either due to
 stalling stream processing or due to reloading).

 So here you say tthat it is better to start to present early, and
 eventually reload [I think] if during the presentation the encoding
 choice shows itself to be wrong, than it would be to investigate too
 much and be absolutely certain before starting to present the page.

I didn't intend to suggest reloading.

 Adding autodetection wouldn't actually force authors to use UTF-8, so
 the problem Faruk stated at the start of the thread (authors not using
 UTF-8 throughout systems that process user input) wouldn't be solved.

 If we take that logic to its end, then it would not make sense for the
 validator to display an error when a page contains a form without being
 UTF-8 encoded, either. Because, after all, the backend/whatever could
 be non-UTF-8 based. The only way to solve that problem on those
 systems, would be to send form content as character entities. (However,
 then too the form based page should still be UTF-8 in the first place,
 in order to be able to take any content.)

Presumably, when an author reacts to an error message, (s)he not only
fixes the page but also the back end.  When a browser makes encoding
guesses, it obviously cannot fix the back end.

 [ Original letter continued: ]
 Apart from UTF-16, Chrome seems quite aggressive w.r.t. encoding
 detection. So it might still be an competitive advantage.

 It would be interesting to know what exactly Chrome does. Maybe
 someone who knows the code could enlighten us?

 +1 (But their approach looks similar to the 'border on sane' approach
 you presented. Except that they seek to detect also non-UTF-8.)

I'm slightly disappointed but not surprised that this thread hasn't
gained a message explaining what Chrome does exactly.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Readiness of script-created documents

2012-04-02 Thread Henri Sivonen

On Fri, Mar 30, 2012 at 8:26 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Friday, March 30, 2012, Henri Sivonen wrote:

 On Fri, Jan 13, 2012 at 2:26 AM, Ian Hickson i...@hixie.ch wrote:
  Jonas is correct. Since there was no interop here I figured we might as
  well go with what made sense.

 I'm somewhat unhappy about fixing IE-introduced APIs to make sense
 like this. The implementation in Gecko isn't particularly good. When
 trying to make it better, I discovered that doing what IE did would
 have lead to simpler code.


 That's not a particularly strong argument. The question is what's better for
 authors.

Gratuitously changing features introduced by IE does not help authors
one day have to support the old IE behavior for years.  Either authors
don't use the API in the uninteroperable situation or they will have
to deal with different browsers returning different things. The
easiest path to get to the point where all browsers in use return the
same thing would have been for others to do what IE did.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Readiness of script-created documents

2012-04-02 Thread Henri Sivonen

On Mon, Apr 2, 2012 at 10:12 AM, Henri Sivonen hsivo...@iki.fi wrote:
 Gratuitously changing features introduced by IE does not help authors
 one day have to

...when they have to...

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Character-encoding-related threads

2012-03-30 Thread Henri Sivonen

On Thu, Dec 1, 2011 at 1:28 AM, Faruk Ates faruka...@me.com wrote:
 We like to think that “every web developer is surely building things in UTF-8 
 nowadays” but this is far from true. I still frequently break websites and 
 webapps simply by entering my name (Faruk Ateş).

Firefox 12 whines to the error console when submitting a form using an
encoding that cannot represent all Unicode. Hopefully, after Firefox
12 has been released, this will help Web authors to actually test
their sites with the error console open locate forms that can corrupt
user input.

 On Wed, 7 Dec 2011, Henri Sivonen wrote:

 I believe I was implementing exactly what the spec said at the time I
 implemented that behavior of Validator.nu. I'm particularly convinced
 that I was following the spec, because I think it's not the optimal
 behavior. I think pages that don't declare their encoding should always
 be non-conforming even if they only contain ASCII bytes, because that
 way templates created by English-oriented (or lorem ipsum -oriented)
 authors would be caught as non-conforming before non-ASCII text gets
 filled into them later. Hixie disagreed.

 I think it puts an undue burden on authors who are just writing small
 files with only ASCII. 7-bit clean ASCII is still the second-most used
 encoding on the Web (after UTF-8), so I don't think it's a small thing.

 http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html

I still think that allowing ASCII-only pages to omit the encoding
declaration is the wrong call. I agree with Simon's point about the
doctype and reliance on quirks.

Firefox Nightly (14 if all goes well) whines to the error console when
the encoding hasn't been declared and about a bunch of other encoding
declaration-related bad conditions. It also warns about ASCII-only
pages, because I didn't want to burn cycles detecting whether a page
is ASCII-only and because I think it's the wrong call not to whine
about ASCII-only templates that might getting non-ASCII content later.
However, I suppressed the message about the lack of an encoding
declaration for different-origin frames, because it is so common for
ad iframes that contain only images or flash objects to lack an
encoding declaration that not suppressing the message would have made
the error console too noisy. It's cheaper to detect whether the
message is about to be emitted for a different-origin frame than to
detect whether it's about to be emitted for an ASCII-only page.
Besides, authors generally are powerless to fix the technical flaws of
different-origin embeds.

 On Mon, 19 Dec 2011, Henri Sivonen wrote:

 Hmm. The HTML spec isn't too clear about when alias resolution happens,
 to I (incorrectly, I now think) mapped only UTF-16, UTF-16BE and
 UTF-16LE (ASCII-case-insensitive) to UTF-8 in meta without considering
 aliases at that point. Hixie, was alias resolution supposed to happen
 first? In Firefox, alias resolution happen after, so meta
 charset=iso-10646-ucs-2 is ignored per the non-ASCII superset rule.

 Assuming you mean for cases where the spec says things like If encoding
 is a UTF-16 encoding, then change the value of encoding to UTF-8, then
 any alias of UTF-16, UTF-16LE, and UTF-16BE (there aren't any registered
 currently, but Unicode might need to be one) would be considered a
 match.
...
 Currently, iso-10646-ucs-2 is neither an alias for UTF-16 nor an
 encoding that is overridden in any way. It's its own encoding.

That's not reality in Gecko.

 I hope the above is clear. Let me know if you think the spec is vague on
 the matter.

Evidently, it's too vague, because I read the spec and implemented
something different from what you meant.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] window.location aborting the parser and subsequent document.writes

2012-03-30 Thread Henri Sivonen

On Tue, Feb 14, 2012 at 2:43 AM, Ian Hickson i...@hixie.ch wrote:
 On Thu, 5 Jan 2012, Henri Sivonen wrote:

 Consider https://bug98654.bugzilla.mozilla.org/attachment.cgi?id=77369
 with the popup blocker disabled.

 Chrome, Opera and IE open a new window/tab and load the Mozilla front
 page into it. Firefox used to but doesn't anymore.

 As far as I can tell, Firefox behaves according to the spec: Setting
 window.location aborts the parser synchronously and the first subsequent
 document.write() then implies a call to document.open(), which aborts
 the navigation started by window.location.

 Per spec, aborting the parser doesn't cause document.write() to imply a
 call to document.open(). Specifically, it leaves the insertion point in a
 state that is defined, but with the parser no longer active, and
 discarding any future data added to it.

That an aborted parser keep having a defined insertion point was
non-obvious. Thanks. Fixed in Gecko on trunk.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Client side value for language preference

2012-03-30 Thread Henri Sivonen

On Thu, Mar 29, 2012 at 10:02 PM, Matthew Nuzum n...@bearfruit.org wrote:
 Some browsers have gotten smarter and now send the first value from
 the user's language preference, which is definitely an improvement. I
 suspect this was done in order to preserve backwards compatibility, so
 much of the useful information is left out.
...
 navigator.language.preference = [{lang:'en-gb', weight: 0.7},{lang:
 'en-us', weight: 0.7},{lang:'en', weight: 0.3}];

Is there a reason to believe that this client-side solution would be
used significantly considering that the HTTP header has not been used
that much?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header (Boris Zbarsky)

2012-03-30 Thread Henri Sivonen

On Mon, Feb 6, 2012 at 9:24 PM, Irakli Nadareishvili ira...@gmail.com wrote:
 if you don't mind me saying it, I am afraid you may be missing the point of 
 this request. In Responsive Web Design, device capabilities are used in a 
 high-level fashion to determine a class of the device: smartphone, tablet, 
 desktop.

Firefox (at least from version 12 up), Opera Mobile and Safari already
expose this information.

Firefox for tablets includes the substring Tablet in the UA string
and Firefox for phones includes the substring Mobile in the UA
string.  If neither Tablet nor Mobile is present in the UA string,
the browser is running on a desktop.

In the case of Opera (excluding Mini), the indicators are Tablet and
Mobi (and desktop otherwise).

In the case of Safari, if the substring iPad is present, it's a
tablet.  Otherwise, if the substring Mobile is present, it's a phone
form factor.  Otherwise, its desktop (or a non-Safari browser spoofing
as Safari).

IE differentiates between desktop on the phone form factor as well:
the mobile form factor in closest substring IEMobile.

Unfortunately, the Android stock browser on Android tablets does not
include a clear tablet indicator.

So you get something like
/**
 * Returns desktop, tablet or phone
 * Some 7 tablets get reported as phones. Android netbook likely get
reported as tablets.
 * Touch input not guaranteed on phones (Opera Mobile on keypad
Symbian, for example) and tablets (non-Fire Kindle)!
 */
function formFactor() {
  var ua = navigator.userAgent;
  if (ua.indexOf(Tablet)  -1) {
// Opera Mobile on tablets, Firefox on tablets, Playbook stock browser
return tablet;
  }
  if (ua.indexOf(iPad)  -1) {
// Safari on tablets
return tablet;
  }
  if (ua.indexOf(Mobi)  -1) {
// Opera Mobile on phones, Firefox on phones, Safari on phones
(and same-sized iPod Touch), IE on phones, Android stock on phones,
Chrome on phones, N9 stock, Dolfin on Bada
return phone;
  }
  if (ua.indexOf(Opera Mini)  -1) {
// Opera Mini (could be on a tablet, though); let's hope Opera
puts Tablet in the Mini UA on tablets
return phone;
  }
  if (ua.indexOf(Symbian)  -1) {
// S60 browser (predates Mobile Safari and does not say Mobile)
return phone;
  }
  if (ua.indexOf(Android)  -1  ua.indexOf(Safari)  -1) {
// Android stock on tablet or Chrome on Android tablet
return tablet;
  }
  if (ua.indexOf(Kindle)  -1) {
// Various Kindles; not all touch!
return tablet;
  }
  if (ua.indexOf(Silk-Accelerated)  -1) {
// Kindle Fire in Silk mode
return tablet;
  }
  return desktop;
}

Seems like the coarse form factor data is pretty much already in the
UA strings. Things could be improved by Opera Mini, Safari, Amazon's
browsers and Google's browsers saying Tablet when on tablet. Symbian
is dead, so no hope for its stock browser starting to say Mobi.

The inferences you may want to make from the form factor data may well be wrong.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Client side value for language preference

2012-03-30 Thread Henri Sivonen

On Fri, Mar 30, 2012 at 5:08 PM, Matthew Nuzum n...@bearfruit.org wrote:
 For example, maybe a site can't afford translation but a small library
 could be included that formats dates and numbers based on a user's
 language preference. No more wondering if 2/3/12 is in March or in
 February.

The reader doesn't know that the site tries to be smart about dates
(but not smart enough to just use ISO dates), so scrambling the order
of date components not to match the convention of the language of the
page is probably worse than using the convention that's congruent with
the language of the page.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Readiness of script-created documents

2012-03-30 Thread Henri Sivonen

On Fri, Jan 13, 2012 at 2:26 AM, Ian Hickson i...@hixie.ch wrote:
 Jonas is correct. Since there was no interop here I figured we might as
 well go with what made sense.

I'm somewhat unhappy about fixing IE-introduced APIs to make sense
like this. The implementation in Gecko isn't particularly good. When
trying to make it better, I discovered that doing what IE did would
have lead to simpler code.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-19 Thread Henri Sivonen

On Wed, Mar 14, 2012 at 12:49 AM, Jonas Sicking jo...@sicking.cc wrote:
 Something that has come up a couple of times with content authors
 lately has been the desire to convert an ArrayBuffer (or part thereof)
 into a decoded string. Similarly being able to encode a string into an
 ArrayBuffer (or part thereof).

 Something as simple as

 DOMString decode(ArrayBufferView source, DOMString encoding);
 ArrayBufferView encode(DOMString source, DOMString encoding,
 [optional] ArrayBufferView destination);

It saddens me that this allows non-UTF-8 encodings. However, since use
cases for non-UTF-8 encodings were mentioned in this thread, I suggest
that the set of supported encodings be an enumerated set of encodings
stated in a spec and browsers MUST NOT support other encodings. The
set should probably be the set offered in the encoding popup at
http://validator.nu/?charset or a subset thereof (containing at least
UTF-8 of course). (That set was derived by researching the
intersection of the encodings supported by browsers, Python and the
JDK.)

 would go a very long way.

Are you sure that it's not necessary to support streaming conversion?
The suggested API design assumes you always have the entire data
sequence in a single DOMString or ArrayBufferView.

 The question is where to stick these
 functions. Internationalization doesn't have a obvious object we can
 hang functions off of (unlike, for example crypto), and the above
 names are much too generic to turn into global functions.

If we deem streaming conversion unnecessary, I'd put the methods on
DOMString and ArrayBufferView. It would be terribly sad to let the
schedules of various working groups affect the API design.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header

2012-02-08 Thread Henri Sivonen

On Tue, Feb 7, 2012 at 4:13 PM, Matthew Wilcox m...@matthewwilcox.com wrote:
 Ahhh, ok. I was not aware that SPDY is intended to suffer from the flaws
 inflicted by the dated mechanics of HTTP. Is it really different semantics
 though? I don't see how it's harmful to enable resource adaption over SPDY
 just because browser vendors have decided that HTTP is too expensive to do
 it?
...
 I'm sensing the SPDY/HTTP identical-semantics standpoint may be a
 philosophical thing rather than technical?

Is it a philosophical or technical thing to suggest that it would be a
bad idea for a server to send different style rules depending on
whether the HTTP client requests /style.css with Accept-Encoding: gzip
or not?

SPDY is an autonegotiated by design invisible to the next layer
upgrade to how HTTP requests and reponses are compressed and mapped to
TCP streams. Of course it would be *possible* to tie other side
effects to this negotiation, but it doesn't mean it's sound design or
a good idea.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header

2012-02-08 Thread Henri Sivonen

On Tue, Feb 7, 2012 at 11:17 PM, divya manian divya.man...@gmail.com wrote:
 This is the info I would love to see any time for my app to make the
 kind of decision it should:
 * connection speed: so I know how fast my resources can load, how quickly.
 * bandwidth caps: so I know I shouldn't be sending HD images.

How do you know that I don't want to use my bandwidth quota to see
your site fully if I chose to navigate to it?

 * battery time: network requests are a drain on battery life, if I
 know before hand, I can make sure the user gets information in time.

Why should you drain my batter faster if the battery is more full?
(For stuff like throttling down animation and XHR polling, the UA
should probably prevent background tabs from draining battery even
when the battery is near full regardless of whether the site/app is
benevolently cooperative.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] RWD Heaven: if browsers reported device capabilities in a request header

2012-02-07 Thread Henri Sivonen

On Mon, Feb 6, 2012 at 5:52 PM, Matthew Wilcox m...@matthewwilcox.com wrote:
 Also, as indicated, with SPDY this is much much less of a problem than for 
 HTTP.

SPDY transfers the HTTP semantics more efficiently when supported. You
aren't supposed to communicate different semantics depending on
whether SPDY is enabled. That would be a layering violation.

That is, SPDY is supposed to work as a drop-in replacement for the old
way of putting HTTP semantics over IP. You aren't supposed to send
different headers depending on whether SPDY is there or not.

And the old HTTP is going to be around for a *long* time, so even if a
bunch of important sites start supporting SPDY, if browsers send the
same headers in all cases to avoid the layering violation, the long
tail of plain old HTTP sites would be harmed by request size bloat.

So I think SPDY will fix it is not a persuasive argument for
allowing HTTP request bloat to cater to the bandwagon of the day.
(Sorry if that seems offensive. You've worked on responsive images, so
they evidently seem important to you, but in the long-term big
picture, it's nowhere near proven that they aren't a fad of interest
to a relative small number of Web developers.)

If there is evidence that responsive images aren't just a fad
bandwagon and there's a long-term need to support them in the
platform, I think supporting something like
picture
source src=something.jpg media=...
source src=other.jpg media=...
img src=fallback.jpg
/picture
would make more sense, since the added to transfer this markup would
affect sites that use this stuff instead of affecting each request to
all sites that don't use this stuff. This would be more
intermediary-friendly, too, by not involving the Vary header.

The points Boris made about the device pixel size of the image
changing after the page load still apply, though.

But still, engineering for sites varying the number of pixels they
send for photos seems a bit premature when sites haven't yet adopted
SVG for illustrations, diagrams, logos, icons, etc.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] add html-attribute for responsive images

2012-02-07 Thread Henri Sivonen

On Tue, Feb 7, 2012 at 1:15 AM, Bjartur Thorlacius svartma...@gmail.com wrote:
 Why not use a media attribute of object?

There's probably already a better answer to Why not use object for
foo? in the archives of this list, but the short version is that it's
nicer for implementations to have elements that support particular
functionality when node is created instead of having elements that
change their nature substantially depending on attributes, network
fetches, presence of plug-ins, etc., etc.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] should we add beforeload/afterload events to the web platform?

2012-02-06 Thread Henri Sivonen

On Tue, Jan 17, 2012 at 6:29 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 1/17/12 7:49 AM, Henri Sivonen wrote:

 On Sun, Jan 15, 2012 at 11:23 PM, Boris Zbarskybzbar...@mit.edu  wrote:

  Preventing _all_ loads for a document based on
 some declarative thing near the start of the document, on the other hand,
 should not be too bad.


 A page-wide disable optimizations flag could easily be cargo-culted
 into something harmful. Consider if the narrative becomes that setting
 such a flag is good for mobile or something.

 Who said anything about disable optimizations?  I suggested a flag to
 prevent all subresource loads, not just speculative preloads.  Basically a
 treat this as a data document flag.

Oh I see. Sorry.

  If that plus a beforeprocess event addresses the
 majority of the web-facing use cases, we should consider adding that.

 So what are the Web-facing use cases? As in: What are people trying to
 accomplish with client-side transformations?

 Well, what mobify is apparently trying to accomplish is take an existing
 (not-mobile-optimized, or in other words typically
 ad-and-float-and-table-laden) page and modify it to look reasonable on a
 small screen.  That includes not loading some of the stylesheets and various
 changes to the DOM, as far as I can tell.

FWIW, I'm completely unsympathetic to this use case and I think we
shouldn't put engineering effort into supporting this scenario. As far
as the user is concerned, it would be much better for the site to get
its act together on the server side and not send an ad-laden table
page to anyone. It sucks to burn resources on the client side to fix
things up using scripts provided by the same server that sends the
broken stuff in the first place.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Augmenting HTML parser to recognize new elements

2012-01-20 Thread Henri Sivonen

On Wed, Jan 18, 2012 at 8:19 PM, Dimitri Glazkov dglaz...@chromium.org wrote:
 A typical example would be specifying an insertion point (that's
 content element) as child of a table:

 table
    content
        tr
            ...
        /tr
    /content
 /table

 Both shadow and template elements have similar use cases.

This doesn't comply with the Degrade Gracefully design principle. Is
this feature so important that it's reasonable to change table parsing
(one of the annoying parts of the parsing algorithm) in a way that'd
make the modified algorithm yield significantly different results than
existing browsers? Have designs that don't require changes to table
parsing been explored?

 What would be the sane way to document such changes to the HTML parser
 behavior?

A change to the HTML spec proper *if* we decide that changes are a good idea.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] title/meta elements outside of head

2012-01-19 Thread Henri Sivonen

On Thu, Jan 19, 2012 at 8:30 AM, Michael Day mike...@yeslogic.com wrote:
 What is the reason why title/meta elements are not always moved to the head,
 regardless of where they appear?

They didn't need to be for compatibility, so we went with less magic.
Also, being able to use meta and link as descendants of body is
useful for Microdata and RDFa Lite without having to mint new void
elements.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] should we add beforeload/afterload events to the web platform?

2012-01-17 Thread Henri Sivonen

On Sun, Jan 15, 2012 at 11:23 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Preventing _all_ loads for a document based on
 some declarative thing near the start of the document, on the other hand,
 should not be too bad.

A page-wide disable optimizations flag could easily be cargo-culted
into something harmful. Consider if the narrative becomes that setting
such a flag is good for mobile or something.

A per-element disable optimizations attribute would be slightly less
dangerous, since authors couldn't just set it once and forget it.

 If that plus a beforeprocess event addresses the
 majority of the web-facing use cases, we should consider adding that.

So what are the Web-facing use cases? As in: What are people trying to
accomplish with client-side transformations?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] should we add beforeload/afterload events to the web platform?

2012-01-09 Thread Henri Sivonen

On Tue, Jan 10, 2012 at 7:48 AM, Tantek Çelik tan...@cs.stanford.edu wrote:
 1. Performance. Reducing bandwidth use / HTTP requests, e.g. AdBlock
 extension[2]

Extension use cases don't require an API exposed to Web content, though.

Furthermore, IE9 has a built content blocking rule engine and Firefox
has a de facto dominant rule engine for year even though it has been
shipped separately (AdBlock Plus). Maybe instead of exposing arbitrary
programmability for content blocking, other browsers should follow IE9
and offer a built-in rule engine for content blocking instead of
letting extensions run arbitrary JS to inspect every load.

 2. Clientside transformations, e.g. Mobify[3]

There's already an easier cross-browser way to deactivate an HTML
page and use its source as input to a program:
document.write(plaintext style='display:none;'); (This gives you
source to work with instead of a DOM, but you can explicitly parse the
source to a DOM.)

Anyway, I'd rather see mobile adaptations be based on CSS instead of
everyone shipping a bunch of JS to the client munge the page in ways
that foil all optimizations that browsers do for regular page loads.

 As might be expected, there is at least one use-case for a
 complementary 'afterload' event:

 1. Downloadable fonts - people who want to use custom fonts for
 drawing in the canvas element need to know when a font has loaded.
 'afterload' seems like a good way to know that, since it happens as a
 side effect of actually using it and fonts don't have an explicit load
 API like images do.[4]

It seems like fonts should have an API for listening when they become
available, yes.

 Should 'beforeload'/'afterload' be explicitly specified and added to
 the web platform?

I'm worried about the interaction with speculative loading. Right now,
Gecko is more aggressive than WebKit about speculative loading. I
don't want to make Gecko less aggressive about speculative loading in
order to fire beforeload exactly at the points where WebKit fires
them. I'm even worried about exposing resource load decisions to the
main thread at all. Right now in Gecko, the HTML parser sees the data
on a non-main thread. Networking runs on another non-main thread. Even
though right now speculative loads travel from the parser thread to
networking library via the main thread, it would be unfortunate to
constrain the design so that future versions of Gecko couldn't
communicate speculative loads directly from the parser thread to the
networking thread without waiting on the main-thread event loop in
between. (In this kind of design, a built-in content blocking rule
engine would be nicer than letting extensions be involved in non-main
threads.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] window.location aborting the parser and subsequent document.writes

2012-01-05 Thread Henri Sivonen

Consider https://bug98654.bugzilla.mozilla.org/attachment.cgi?id=77369
with the popup blocker disabled.

Chrome, Opera and IE open a new window/tab and load the Mozilla front
page into it. Firefox used to but doesn't anymore.

As far as I can tell, Firefox behaves according to the spec: Setting
window.location aborts the parser synchronously and the first
subsequent document.write() then implies a call to document.open(),
which aborts the navigation started by window.location.

Is there a mechanism in the spec that makes this work as in Chrome,
Opera and IE and I'm failing to read the spec right? If not, what's
the mechanism that causes Chrome and IE to load the Mozilla front page
into the newly-opened window/tab in this case?

Note that in this modified case
http://hsivonen.iki.fi/test/moz/write-after-location.html (requires a
click so that there's no need to adjust the popup blocker) the console
says before and after but not later in Chrome and IE.

Opera says before and after and then the opener script ends with a
security error, because write is already a different-origin call, i.e.
setting window.location has immediately made the document in the new
window different-origin.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2012-01-03 Thread Henri Sivonen

On Thu, Dec 22, 2011 at 12:36 PM, Leif Halvard Silli l...@russisk.no wrote:
 It's unclear to me if you are talking about HTTP-level charset=UNICODE
 or charset=UNICODE in a meta. Is content labeled with charset=UNICODE
 BOMless?

 Charset=UNICODE in meta, as generated by MS tools (Office or IE, eg.)
 seems to usually be BOM-full. But there are still enough occurrences
 of pages without BOM. I have found UTF-8 pages with the charset=unicode
 label in meta. But the few page I found contained either BOM or
 HTTP-level charset=utf-8. I have to little research material when it
 comes to UTF-8 pages with charset=unicode inside.

Making 'unicode' an alias of UTF-16 or UTF-16LE would be useless for
pages that have a BOM, because the BOM is already inspected before
meta and if HTTP-level charset is unrecognized, the BOM wins.

Making 'unicode' an alias of UTF-16 or UTF-16LE would be useful for
UTF-8-encoded pages that say charset=unicode in meta if alias
resolution happens before UTF-16 labels are mapped to UTF-8.

Making 'unicode' an alias for UTF-16 or UTF-16LE would be useless for
pages that are (BOMless) UTF-16LE and that have charset=unicode in
meta, because the meta prescan doesn't see UTF-16-encoded metas.
Furthermore, it doesn't make sense to make the meta prescan look for
UTF-16-encoded metas, because it would make sense to honor the value
only if it matched a flavor of UTF-16 appropriate for the pattern of
zero bytes in the file, so it would be more reliable and straight
forward to just analyze the pattern of zero bytes without bothering to
look for UTF-16-encoded metas.

 When the detector says UTF-8 - that is step 7 of the sniffing algorith,
 no?
 http://dev.w3.org/html5/spec/parsing.html#determining-the-character-encoding

Yes.

  2) Start the parse assuming UTF-8 and reload as Windows-1252 if the
 detector says non-UTF-8.
...
 I think you are mistaken there: If parsers perform UTF-8 detection,
 then unlabelled pages will be detected, and no reparsing will happen.
 Not even increase. You at least need to explain this negative spiral
 theory better before I buy it ... Step 7 will *not* lead to reparsing
 unless the default encoding is WINDOWS-1252. If the default encoding is
 UTF-8, then step 7, when it detects UTF-8, then it means that parsing
 can continue uninterrupted.

That would be what I labeled as option #2 above.

 What we will instead see is that those using legacy encodings must be
 more clever in labelling their pages, or else they won't be detected.

Many pages that use legacy encodings are legacy pages that aren't
actively maintained. Unmaintained pages aren't going to become more
clever about labeling.

 I am a bitt baffled here: It sounds like you say that there will be bad
 consequences if browsers becomes more reliable ...

Becoming more reliable can be bad if the reliability comes at the cost
of performance, which would be the case if the kind of heuristic
detector that e.g. Firefox has was turned on for all locales. (I don't
mean the performance impact of running a detector state machine. I
mean the performance impact of reloading the page or, alternatively,
the loss of incremental rendering.)

A solution that would border on reasonable would be decoding as
US-ASCII up to the first non-ASCII byte and then deciding between
UTF-8 and the locale-specific legacy encoding by examining the first
non-ASCII byte and up to 3 bytes after it to see if they form a valid
UTF-8 byte sequence. But trying to gain more statistical confidence
about UTF-8ness than that would be bad for performance (either due to
stalling stream processing or due to reloading).

 Apart from UTF-16, Chrome seems quite aggressive w.r.t. encoding
 detection. So it might still be an competitive advantage.

It would be interesting to know what exactly Chrome does. Maybe
someone who knows the code could enlighten us?

 * Let's say that I *kept* ISO-8859-1 as default encoding, but instead
 enabled the Universal detector. The frame then works.
 * But if I make the frame page very short, 10 * the letter ø as
 content, then the Universal detector fails - on a test on my own
 computer, it guess the page to be Cyrillic rather than Norwegian.
 * What's the problem? The Universal detector is too greedy - it tries
 to fix more problems than I have. I only want it to guess on UTF-8.
 And if it doesn't detect UTF-8, then it should fall back to the locale
 default (including fall back to the encoding of the parent frame).

 Wouldn't that be an idea?

 No. The current configuration works for Norwegian users already. For
 users from different silos, the ad might break, but ad breakage is
 less bad than spreading heuristic detection to more locales.

 Here I must disagree: Less bad for whom?

For users performance-wise.

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2012-01-03 Thread Henri Sivonen

On Tue, Jan 3, 2012 at 10:33 AM, Henri Sivonen hsivo...@iki.fi wrote:
 A solution that would border on reasonable would be decoding as
 US-ASCII up to the first non-ASCII byte and then deciding between
 UTF-8 and the locale-specific legacy encoding by examining the first
 non-ASCII byte and up to 3 bytes after it to see if they form a valid
 UTF-8 byte sequence. But trying to gain more statistical confidence
 about UTF-8ness than that would be bad for performance (either due to
 stalling stream processing or due to reloading).

And it's worth noting that the above paragraph states a solution to
the problem that is: How to make it possible to use UTF-8 without
declaring it?

Adding autodetection wouldn't actually force authors to use UTF-8, so
the problem Faruk stated at the start of the thread (authors not using
UTF-8 throughout systems that process user input) wouldn't be solved.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] [encoding] utf-16

2012-01-02 Thread Henri Sivonen

On Fri, Dec 30, 2011 at 12:54 PM, Anne van Kesteren ann...@opera.com wrote:
 And why should there be UTF-16 sniffing?

The reason why Gecko detects BOMless Basic Latin-only UTF-16
regardless of the heuristic detector mode is
https://bugzilla.mozilla.org/show_bug.cgi?id=631751

It's quite possible that Firefox could have gotten away with not
having this behavior.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] [encoding] utf-16

2012-01-02 Thread Henri Sivonen

On Tue, Dec 27, 2011 at 4:52 PM, Anne van Kesteren ann...@opera.com wrote:
 I ran some utf-16 tests using 007A as input data, optionally preceded by
 FFFE or FEFF, and with utf-16, utf-16le, and utf-16be declared in the
 Content-Type header

I suggest testing with zero, one, two and three BOMs. I'd expect Gecko
to have a bug that causes it to remove *two* BOMs but not more than
that.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Another bug in the HTML parsing spec?

2011-12-20 Thread Henri Sivonen

On Tue, Oct 18, 2011 at 3:47 AM, Ian Hickson i...@hixie.ch wrote:

 2) I can't get all of the parser tests from html5lib to pass with this
 algorithm as it is currently written.  In particular, there are 5 tests in
 testdata/tree-construction/tests9.dat of this basic form:

 !DOCTYPE htmlbodytablemathmifoo/mi/math/table

 As the spec is written, the mi tag is a text integration point, so the 
 foo
 text token is handled like regular content, not like foreign content.

 Oh, my, yeah, that's all kinds of wrong. The text node should be handled
 as if it was in the in body mode, not as if it was in table. I'll have
 to study this closer.

 I think this broke when we moved away from using an insertion mode for
 foreign content.

 Henri, do you know how Gecko gets this right currently?

The tree builder in Gecko always uses an accumulation buffer that gets
flushed when the tree builder sees and end tag token or a start tag
token.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] document.write(\r): the spec doesn't say how to handle it.

2011-12-19 Thread Henri Sivonen

On Wed, Dec 14, 2011 at 2:00 AM, Ian Hickson i...@hixie.ch wrote:
 I can remove the text one at a time, if you like. Would that be
 satisfactory? Or I guess I could change the spec to say that the parser
 should process the characters, rather than the tokenizer, since really
 it's the whole shebang that needs to be involved (stream preprocessor and
 everything). Any opinions on what the right text is here?

I'd like the CRLF preprocessing to be defined as an eager stateful
operation so that there's one bit of state: last was CR. Then, input
is handled as follows:
If the input character is CR, set last was CR to true and emit LF.
If the input character is LF and last was CR is true, don't emit
anything and set last was CR to false.
If the input character is LF and last was CR is is false, emit LF.
Else set last was CR to false and emit the input character.

Where emit feeds into the tokenizer. By eager, I mean that the
operation described above doesn't buffer. I.e. the first case emits an
LF upon seeing a CR without waiting for an LF also to appear in the
input.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-19 Thread Henri Sivonen

 View-Character_Encoding-Auto-Detect-Off ? Anyway: I
 agree that the encoding menus could be simpler/clearer.

 I think the most counter-intuitive thing is to use the word
 auto-detect about the heuristic detection - see what I said above
 about behaves automatic even when auto-detect is disabled. Opera's
 default setting is called Automatic selection. So it is all
 automatic ...

Yeah, automatic means different things in different browsers.

 As for heuristic detection based on the bytes of the page, the only
 heuristic that can't be disabled is the heuristic for detecting
 BOMless UTF-16 that encodes Basic Latin only. (Some Indian bank was
 believed to have been giving that sort of files to their customers and
 it worked in pre-HTML5 browsers that silently discarded all zero
 bytes prior to tokenization.) The Cyrillic and CJK detection
 heuristics can be turned on and off by the user.

 I always wondered what the Universal detection meant. Is that simply
 the UTF-8 detection?

Universal means that it runs all the detectors that Firefox supports
in parallel, so possible guessing space isn't constrained by locale.
The other modes constrain the guessing space to a locale. For example,
the Japanese detector won't give a Chinese or Cyrillic encoding as its
guess.

 So let's say that you tell your Welsh localizer that: Please switch to
 WINDOWS-1252 as the default, and then instead I'll allow you to enable
 this brand new UTF-8 detection. Would that make sense?

Not really. I think we shouldn't spread heuristic detection to any
locale that doesn't already have it.

 Within an origin, Firefox considers the parent frame and the previous
 document in the navigation history as sources of encoding guesses.
 That behavior is not user-configurable to my knowledge.

 W.r.t. iframe, then the big in Norway newspaper Dagbladet.no is
 declared ISO-8859-1 encoded and it includes a least one ads-iframe that
 is undeclared ISO-8859-1 encoded.

 * If I change the default encoding of Firefox to UTF-8, then the main
 page works but that ad fails, encoding wise.

Yes, because the ad is different-origin, so it doesn't inherit the
encoding from the parent page.

 * But if I enable the Universal encoding detector, the ad does not fail.

 * Let's say that I *kept* ISO-8859-1 as default encoding, but instead
 enabled the Universal detector. The frame then works.
 * But if I make the frame page very short, 10 * the letter ø as
 content, then the Universal detector fails - on a test on my own
 computer, it guess the page to be Cyrillic rather than Norwegian.
 * What's the problem? The Universal detector is too greedy - it tries
 to fix more problems than I have. I only want it to guess on UTF-8.
 And if it doesn't detect UTF-8, then it should fall back to the locale
 default (including fall back to the encoding of the parent frame).

 Wouldn't that be an idea?

No. The current configuration works for Norwegian users already. For
users from different silos, the ad might break, but ad breakage is
less bad than spreading heuristic detection to more locales.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Use of media queries to limit bandwidth/data transfer

2011-12-19 Thread Henri Sivonen

On Fri, Dec 9, 2011 at 12:10 AM, James Graham jgra...@opera.com wrote:
 It's not clear that device-width and device-height should be encouraged
 since they don't tell you anything about how much content area is *actually*
 visible to the user.

Why do media queries support querying the device dimensions? Shouldn't
those be changed to be aliases for width and height?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table

2011-12-15 Thread Henri Sivonen

On Tue, Dec 13, 2011 at 4:23 AM, Adam Barth w...@adambarth.com wrote:
 I'm trying to understand how the HTML parsing spec handles the following case:

 !DOCTYPE htmlbodytablemathmifoo/mi/math/table

 According to the html5lib test data, we should parse that as follows:

 | !DOCTYPE html
 | html
 |   head
 |   body
 |     math math
 |       math mi
 |         foo
 |     table

The expectation of the test case makes sense.

 However, I'm not sure whether that's what the spec actually does.

I think that's a spec bug.

 The net result of which is popping the stack of open elements, but not
 flushing out the pending table character tokens list.

The reason why Gecko does what makes sense is that Gecko uses a text
accumulation buffer for non-table cases, too, and any tag token
flushes the buffer. (Not quite optimal for ignored tags, sure.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-09 Thread Henri Sivonen

On Fri, Dec 9, 2011 at 12:33 AM, Leif Halvard Silli
xn--mlform-...@xn--mlform-iua.no wrote:
 Henri Sivonen Tue Dec 6 23:45:11 PST 2011:
 These localizations are nevertheless live tests. If we want to move
 more firmly in the direction of UTF-8, one could ask users of those
 'live tests' about their experience.

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=708995

 (which means
 *other-language* pages when the language of the localization doesn't
 have a pre-UTF-8 legacy).

 Do you have any concrete examples?

The example I had in mind was Welsh.

 And are there user complaints?

Not that I know of, but I'm not part of a feedback loop if there even
is a feedback loop here.

 The Serb localization uses UTF-8. The Croat uses Win-1252, but only on
 Windows and Mac: On Linux it appears to use UTF-8, if I read the HG
 repository correctly.

OS-dependent differences are *very* suspicious. :-(

 I think that defaulting to UTF-8 is always a bug, because at the time
 these localizations were launched, there should have been no unlabeled
 UTF-8 legacy, because up until these locales were launched, no
 browsers defaulted to UTF-8 (broadly speaking). I think defaulting to
 UTF-8 is harmful, because it makes it possible for locale-siloed
 unlabeled UTF-8 content come to existence

 The current legacy encodings nevertheless creates siloed pages already.
 I'm also not sure that it would be a problem with such a UTF-8 silo:
 UTF-8 is possible to detect, for browsers - Chrome seems to perform
 more such detection than other browsers.

While UTF-8 is possible to detect, I really don't want to take Firefox
down the road where users who currently don't have to suffer page load
restarts from heuristic detection have to start suffering them. (I
think making incremental rendering any less incremental for locales
that currently don't use a detector is not an acceptable solution for
avoiding restarts. With English-language pages, the UTF-8ness might
not be apparent from the first 1024 bytes.)

 In another message you suggested I 'lobby' against authoring tools. OK.
 But the browser is also an authoring tool.

In what sense?

 So how can we have authors
 output UTF-8, by default, without changing the parsing default?

Changing the default is an XML-like solution: creating breakage for
users (who view legacy pages) in order to change author behavior.

To the extent a browser is a tool Web authors use to test stuff, it's
possible to add various whining to console without breaking legacy
sites for users. See
https://bugzilla.mozilla.org/show_bug.cgi?id=672453
https://bugzilla.mozilla.org/show_bug.cgi?id=708620

 Btw: In Firefox, then in one sense, it is impossible to disable
 automatic character detection: In Firefox, overriding of the encoding
 only lasts until the next reload.

A persistent setting for changing the fallback default is in the
Advanced subdialog of the font prefs in the Content preference
pane. It's rather counterintuitive that the persistent autodetection
setting is in the same menu as the one-off override.

As for heuristic detection based on the bytes of the page, the only
heuristic that can't be disabled is the heuristic for detecting
BOMless UTF-16 that encodes Basic Latin only. (Some Indian bank was
believed to have been giving that sort of files to their customers and
it worked in pre-HTML5 browsers that silently discarded all zero
bytes prior to tokenization.) The Cyrillic and CJK detection
heuristics can be turned on and off by the user.

Within an origin, Firefox considers the parent frame and the previous
document in the navigation history as sources of encoding guesses.
That behavior is not user-configurable to my knowledge.

Firefox also remembers the encoding from previous visits as long as
Firefox otherwise has the page in cache. So for testing, it's
necessary to make Firefox forget about previous visits to the test
case.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-07 Thread Henri Sivonen

On Tue, Dec 6, 2011 at 2:10 AM, Kornel Lesiński kor...@geekhood.net wrote:
 On Fri, 02 Dec 2011 15:50:31 -, Henri Sivonen hsivo...@iki.fi wrote:

 That compatibility mode already exists: It's the default mode--just
 like the quirks mode is the default for pages that don't have a
 doctype. You opt out of the quirks mode by saying !DOCTYPE html. You
 opt out of the encoding compatibility mode by saying meta
 charset=utf-8.


 Could !DOCTYPE html be an opt-in to default UTF-8 encoding?

 It would be nice to minimize number of declarations a page needs to include.

I think that's a bad idea. We already have *three*
backwards-compatible ways to opt into UTF-8. !DOCTYPE html isn't one
of them. Moreover, I think it's a mistake to bundle a lot of unrelated
things into one mode switch instead of having legacy-compatible
defaults and having granular ways to opt into legacy-incompatible
behaviors. (That is, I think, in retrospect, it's bad that we have a
doctype-triggered standards mode with legacy-incompatible CSS defaults
instead of having legacy-compatible CSS defaults and CSS properties
for opting into different behaviors.)

If you want to minimize the declarations, you can put the UTF-8 BOM
followed by !DOCTYPE html at the start of the file.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Henri Sivonen

On Mon, Dec 5, 2011 at 8:55 PM, Leif Halvard Silli
xn--mlform-...@xn--mlform-iua.no wrote:
 When you say 'requires': Of course, HTML5 recommends that you declare
 the encoding (via HTTP/higher protocol, via the BOM 'sideshow' or via
 meta charset=UTF-8). I just now also discovered that Validator.nu
 issues an error message if it does not find any of of those *and* the
 document contains non-ASCII. (I don't know, however, whether this error
 message is just something Henri added at his own discretion - it would
 be nice to have it literally in the spec too.)

I believe I was implementing exactly what the spec said at the time I
implemented that behavior of Validator.nu. I'm particularly convinced
that I was following the spec, because I think it's not the optimal
behavior. I think pages that don't declare their encoding should
always be non-conforming even if they only contain ASCII bytes,
because that way templates created by English-oriented (or lorem ipsum
-oriented) authors would be caught as non-conforming before non-ASCII
text gets filled into them later. Hixie disagreed.

 HTML5 says that validators *may* issue a warning if UTF-8 is *not* the
 encoding. But so far, validator.nu has not picked that up.

Maybe it should. However, non-UTF-8 pages that label their encoding,
that use one of the encodings that we won't be able to get rid of
anyway and that don't contain forms aren't actively harmful. (I'd
argue that they are *less* harmful than unlabeled UTF-8 pages.)
Non-UTF-8 is harmful in form submission. It would be more focused to
make the validator complain about labeled non-UTF-8 if the page
contains a form. Also, it could be useful to make Firefox whine to
console when a form is submitted in non-UTF-8 and when an HTML page
has no encoding label. (I'd much rather implement all these than
implement breaking changes to how Firefox processes legacy content.)

 We should also lobby for authoring tools (as recommended by HTML5) to
 default their output to UTF-8 and make sure the encoding is declared.

 HTML5 already says: Authoring tools should default to using UTF-8 for
 newly-created documents. [RFC3629]
 http://dev.w3.org/html5/spec/semantics.html#charset

I think focusing your efforts on lobbying authoring tool vendors to
withhold the ability to save pages in non-UTF-8 encodings would be a
better way to promote UTF-8 than lobbying browser vendors to change
the defaults in ways that'd break locale-siloed Existing Content.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-04 Thread Henri Sivonen

On Fri, Dec 2, 2011 at 6:29 PM, Glenn Maynard gl...@zewt.org wrote:
 On Fri, Dec 2, 2011 at 10:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 Regarding your (and 16) remark, considering my personal happiness at
 work, I'd prioritize the eradication of UTF-16 as an interchange
 encoding much higher than eradicating ASCII-based non-UTF-8 encodings
 that all major browsers support. I think suggesting a solution to the
 encoding problem while implying that UTF-16 is not a problem isn't
 particularly appropriate. :-)
...
 I don't think I'd call it a bigger problem, though, since it's comparatively
 (even vanishingly) rare, where untagged legacy encodings are a widespread
 problem that gets worse every day we can't think of a way to curtail it.

From implementation perspective, UTF-16 has its own class of bugs than
are unlike other encoding-related bugs and fixing those bugs is
particularly annoying because you know that UTF-16 is so rare that you
know the fix has little actual utility.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-02 Thread Henri Sivonen

On Thu, Dec 1, 2011 at 1:28 AM, Faruk Ates faruka...@me.com wrote:
 My understanding is that all browsers* default to Western Latin (ISO-8859-1) 
 encoding by default (for Western-world downloads/OSes) due to legacy content 
 on the web.

As has already been pointed out, the default depends varies by locale.

 But how relevant is that still today?

It's relevant for supporting the long tail of existing content. The
sad part is that the mechanisms that allows existing legacy content to
work within each locale silo also makes it possible for ill-informed
or uncaring authors to develop more locale-siloed content (i.e.
content that doesn't declare the encoding and, therefore, only works
when the user's fallback encoding is the same as the author's).

 I'm wondering if it might not be good to start encouraging defaulting to 
 UTF-8, and only fallback to Western Latin if it is detected that the content 
 is very old / served by old infrastructure or servers, etc. And of course if 
 the content is served with an explicit encoding of Western Latin.

I think this would be a very bad idea. It would make debugging hard.
Moreover, it would be the wrong heuristic, because well-maintained
server infrastructure can host a lot of legacy content. Consider any
shared hosting situation where the administrator of the server
software isn't the content creator.

 We like to think that “every web developer is surely building things in UTF-8 
 nowadays” but this is far from true. I still frequently break websites and 
 webapps simply by entering my name (Faruk Ateş).

For things to work, the server-side component needs to deal with what
gets sent to it. ASCII-oriented authors could still mishandle all
non-ASCII even if Web browsers forced them to deal with UTF-8 by
sending them UTF-8.

Furthermore, your proposed solution wouldn't work for legacy software
that correctly declares an encoding but declared a non-UTF-8 encoding.

Sadly, getting sites to deal with your name properly requires the
developer of each site to get a clue. :-( Just sending form
submissions in UTF-8 isn't enough if the recipient can't deal. Compare
with http://krijnhoetmer.nl/irc-logs/whatwg/20110906#l-392

 Yes, I understand that that particular issue is something we ought to fix 
 through evangelism, but I think that WHATWG/browser vendors can help with 
 this while at the same time (rightly, smartly) making the case that the web 
 of tomorrow should be a UTF-8 (and 16) based one, not a smorgasbord of 
 different encodings.

Anne has worked on speccing what exactly the smorgasbord should be.
See http://wiki.whatwg.org/wiki/Web_Encodings . I think it's not
realistic to drop encodings that are on the list of encodings you see
in the encoding menu on http://validator.nu/?charset However, I think
browsers should drop support for encodings that aren't already
supported by all the major browsers, because such encodings only serve
to enable browser-specific content and encoding proliferation.

Regarding your (and 16) remark, considering my personal happiness at
work, I'd prioritize the eradication of UTF-16 as an interchange
encoding much higher than eradicating ASCII-based non-UTF-8 encodings
that all major browsers support. I think suggesting a solution to the
encoding problem while implying that UTF-16 is not a problem isn't
particularly appropriate. :-)

 So hence my question whether any vendor has done any recent research in this. 
 Mobile browsers seem to have followed desktop browsers in this; perhaps this 
 topic was tested and researched in recent times as part of that, but I 
 couldn't find any such data. The only real relevant thread of discussion 
 around UTF-8 as a default was this one about Web Workers:
 http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-September/023197.html

 …which basically suggested that everyone is hugely in favor of UTF-8 and 
 making it a default wherever possible.

 So how 'bout it?

I think in order to comply with the Support Existing Content design
principle (even if it unfortunately means that support is siloed by
locale) and in order to make plans that are game theoretically
reasonable (not taking steps that make users migrate to browsers that
haven't taken the steps), I think we shouldn't change the fallback
encodings from what the HTML5 spec says when it comes to loading
text/html or text/plain content into a browsing context.

 What's going in this area, if anything?

There's the effort to specify a set of encodings and their aliases for
browsers to support. That's moving slowly, since Anne has other more
important specs to work on.

Other than that, there have been efforts to limit new features to
UTF-8 only (consider scripts in Workers and App Cache manifests) and
efforts to make new features not vary by locale-dependent defaults
(consider HTML in XHR). Both these efforts have faced criticism,
unfortunately.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Default encoding to UTF-8?

2011-12-02 Thread Henri Sivonen

On Thu, Dec 1, 2011 at 8:29 PM, Brett Zamir bret...@yahoo.com wrote:
 How about a Compatibility Mode for the older non-UTF-8 character set
 approach, specific to page?

That compatibility mode already exists: It's the default mode--just
like the quirks mode is the default for pages that don't have a
doctype. You opt out of the quirks mode by saying !DOCTYPE html. You
opt out of the encoding compatibility mode by saying meta
charset=utf-8.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] createContextualFragment in detached contexts

2011-11-16 Thread Henri Sivonen

On Fri, Sep 30, 2011 at 7:56 PM, Erik Arvidsson a...@chromium.org wrote:
 On Fri, Sep 30, 2011 at 07:35, Henri Sivonen hsivo...@iki.fi wrote:
 On Fri, Sep 30, 2011 at 1:37 AM, Erik Arvidsson a...@chromium.org wrote:
 If the context object is in a detached state, then relax the parsing
 rules so that all elements are allowed at that level. The hand wavy
 explanation is that for every tag at the top level create a new
 element in the same way that ownerDocument.createElement would do it.

 I would prefer not to add a new magic mode to the parsing algorithm
 that'd differ from what innerHTML requires.

 So you want every js library to have to do this kind of work around
 instead?

This topic has migrated to public-webapps. My current thinking is
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/0818.html

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Allowing custom attributes on html or body in documents for media resources

2011-11-15 Thread Henri Sivonen

On Thu, Nov 10, 2011 at 2:03 AM, Robert O'Callahan rob...@ocallahan.org wrote:
 http://www.whatwg.org/specs/web-apps/current-work/#read-media
 Can we allow the UA to add custom vendor-prefixed attributes to the html
 and/or body elements? Alternatively, a vendor-prefixed class? We want to
 be able to use a style sheet with rules matching custom attributes to
 indicate various situations (e.g., whether the document is a toplevel
 browsing context) to set the viewport background.

Why can't non-prefixed attributes be minted for these use cases?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] document.write(\r): the spec doesn't say how to handle it.

2011-11-04 Thread Henri Sivonen

On Thu, Nov 3, 2011 at 8:13 PM, David Flanagan dflana...@mozilla.com wrote:
 Each tokenizer state would have to add a rule for CR that said  emit LF,
 save the current tokenizer state, and set the tokenizer state to after CR
 state.

The Validator.nu/Gecko tokenizer returns a last input code unit
processed was CR flag to the caller. If the tokenizer sees a CR, the
tokenizer processes it and returns to the caller immediately with the
flag set to true. The caller is responsible for checking if the next
input code unit is an LF, skipping over it and calling the tokenizer
again. This way, the tokenizer itself does not need to have the
capability of skipping over a character and the same capabilities that
are normally used for dealing with arbitrary buffer boundaries and
early returns after script end tags (or timers before the parser moved
off the main thread) work.

 The parser operates on UTF-16 code units, so a lone surrogate is emitted.

 The spec seems pretty unambiguous that it operates on codepoints

The spec is empirically wrong. The wrongness has been reported. The
spec tries to retrofit Unicode theoretical purity onto legacy where no
purity existed.

The tokenizer operates on UTF-16 code units. document.write() feeds
UTF-16 code units to the tokenizer without lone surrogate
preprocessing. The tokenizer or the tree builder don't do anything
about lone surrogates. When consuming a byte stream, the converter
that converts (potentially unaligned and potentially foreign
byte-order) the UTF-16-encoded byte stream into a stream of UTF-16
code units is responsible for treating unpaired surrogates as
conversion errors.

Sorry about not mentioning earlier that the problematic tests are also
problematic in this sense.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] document.write(\r): the spec doesn't say how to handle it.

2011-11-03 Thread Henri Sivonen

On Thu, Nov 3, 2011 at 1:57 AM, David Flanagan dflana...@mozilla.com wrote:
 Firefox, Chrome and Safari all seem to do the right thing: wait for the next
 character before tokenizing the CR.

See http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1247

Firefox tokenizes the CR immediately, emits an LF and then skips over
the next character if it is an LF. When I designed the solution
Firefox uses, I believed it was more correct and more compatible with
legacy than whatever the spec said at the time.

Chrome seems to wait for the next character before tokenizing the CR.

 And I think this means that the description of document.write needs to be 
 changed.

All along, I've felt thought that having U+ and CRLF handling as a
stream preprocessing step was bogus and both should happen upon
tokenization. So far, I've managed to convince Hixie about U+
handling.

 Similarly, what should the tokenizer do if the document.write emits half of
 a UTF-16 surrogate pair as the last character?

The parser operates on UTF-16 code units, so a lone surrogate is emitted.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Signed XHTML

2011-10-27 Thread Henri Sivonen

On Thu, Oct 20, 2011 at 9:57 PM, Martin Boßlet
martin.boss...@googlemail.com wrote:
 Are there plans in this direction? Would functionality like this have a
 chance to be considered for the standard?

The chances are extremely slim.

XML signatures depend on XML canonicalization which is notoriously
difficult to implement correctly and suffers from interop problems
because unmatched sets of bugs in the canonicalization phase make
signature verification fail. I think browser vendors would be
reasonable if they resisted making XML signatures of canonicalization
part of the platform.

Moreover, most of the Web is HTML, so enthusiasm for XHTML-only
features is likely very low these days.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] [fullscreen] Drop requestFullScreenWithKeys()?

2011-10-12 Thread Henri Sivonen

On Wed, Oct 12, 2011 at 11:56 AM, Anne van Kesteren ann...@opera.com wrote:
 Given the way Mac OS handles full screen applications I wonder whether
 requestFullScreenWithKeys() is needed. A toolbar will always appear at the
 top if you locate your cursor there.

Does the user realize that? Can the user do that if a mouse lock API
is used also? Can a user without a pointing device do that?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] createContextualFragment in detached contexts

2011-09-30 Thread Henri Sivonen

On Fri, Sep 30, 2011 at 1:37 AM, Erik Arvidsson a...@chromium.org wrote:
 If the context object is in a detached state, then relax the parsing
 rules so that all elements are allowed at that level. The hand wavy
 explanation is that for every tag at the top level create a new
 element in the same way that ownerDocument.createElement would do it.

I would prefer not to add a new magic mode to the parsing algorithm
that'd differ from what innerHTML requires.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] input type=barcode?

2011-08-04 Thread Henri Sivonen

On Wed, 2011-08-03 at 17:21 +0200, Anne van Kesteren wrote:
 On Wed, 03 Aug 2011 16:52:03 +0200, Mikko Rantalainen  
 mikko.rantalai...@peda.net wrote:
  What do you think?
 
 Implementing this seems rather complicated for such a niche use. It also  
 seems better to let sites handle this by themselves so these physical  
 codes can evolve more easily.

I don't know how niche thing it is to actually own a dedicated USB
barcode reader, but where I live, using at least one Web app that
supports bar code reading (by having a text input requiring the bar code
reader can emulate a keyboard) is as mainstream as Web app usage gets
(banking).

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Why children of datalist elements are barred from constraint validation?

2011-08-02 Thread Henri Sivonen

On Fri, 2011-07-29 at 15:20 -0700, Jonas Sicking wrote:
 On Fri, Jul 29, 2011 at 2:59 PM, Aryeh Gregor simetrical+...@gmail.com 
 wrote:
  On Fri, Jul 29, 2011 at 5:51 PM, Jonas Sicking jo...@sicking.cc wrote:
  On Fri, Jul 29, 2011 at 9:43 AM, Ian Hickson i...@hixie.ch wrote:
  Looking specifically at datagrid's ability to fall back to select, I
  agree that it's not necessarily doing to be widely used, but given that
  it's so simple to support and provides such a clean way to do fallback, I
  really don't see the harm in supporting it.
 
  I haven't looked at datagrid yet, so I can't comment.
 
  I think he meant datalist.  datagrid was axed quite some time ago
  and hasn't made a reappearance that I know of.
 
 Ah, well, then it definitely seems like we should get rid of this
 feature. The harm is definitely there in that it's adding a feature
 without solving any problem.

The current design solves the problem that the datalist feature needs
to Degrade Gracefully (and preferably without having to import a script
library). I think the solution is quite elegant and don't see a need to
drop it.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] sic element

2011-08-02 Thread Henri Sivonen

On Fri, 2011-07-29 at 22:39 +, Ian Hickson wrote:
   If it's ok if it's entirely ignored, then it's presentational, and not 
   conveying any useful information.
  
  Presentational markup may convey useful information, for example that a 
  quotation from printed matter contains an underlined word.
 
 HTML is the wrong language for this kind of thing.

I disagree. From time to time, people want to take printed matter an
publish it on the Web. In practice, the formats available are PDF and
HTML. HTML works more nicely in browsers and for practical purposes
works generally better when the person taking printed matter to the Web
decides that the exact line breaks and the exact font aren't of
importance. They may still consider it of importance to preserve bold,
italic and underline and maybe even delegate that preservation to OCR
software that has no clue about semantics. (Yes, bold, italic and
underline are qualitatively different from line breaks and the exact
font even if you could broadly categorize them all as presentational
matters.)

I think it's not useful for the Web for you to decree that HTML is the
wrong language for this kind of thing. There's really no opportunity to
launch a new format precisely for that use case. Furthermore, in
practice, HTML already works fine for this kind of thing. The technical
solution is there already. You just decree it wrong as a matter of
principle. When introducing new Web formats is prohibitively hard and
expensive, I think it doesn't make sense to take the position that
something that already works is the wrong language. 

 I think you are confused as to the goals here. The presentational markup 
 that was u, i, b, font, small, etc, is gone.

I think the reason why Jukka and others seem to be confused about your
goals is that your goals here are literally incredible from the point of
view of other people. Even though you've told me f2f what you believe
and I want to trust that you are sincere in your belief, I still have a
really hard time believing that you believe what you say you believe
about the definitions of b, i and u. When after discussing this
with you f2f, I still find your position incredible, I think it's not at
all strange if other people when reading the spec text interpret your
goals inaccurately because your goals don't seem like plausible goals to
them.

If if the word presentational carries too much negative baggage, I
suggest defining b, i and u as typographic elements on visual
media (and distinctive elements on other media) and adjusting the
rhetoric that HTML is a semantic markup language to HTML being a mildly
semantic markup language that also has common phrase-level typographic
features.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Support for RDFa in HTML5

2011-08-02 Thread Henri Sivonen

On Tue, 2011-08-02 at 13:55 +, aykut.sen...@bild.de wrote:
 I would like to know if these attributes will be part of HTML5 or is
 there another valid method to integrate RDFa into HTML5?

Why do you need RDFa?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] date meta-tag invalid

2011-07-28 Thread Henri Sivonen

On Tue, 2011-07-26 at 11:27 +, aykut.sen...@bild.de wrote:
 http://www.google.com/support/news_pub/bin/answer.py?answer=93994
 
 See Link above, Google says, that they provide DC.date.issued, but this
 is also not part auf the whatwg metaextensions list.

It's part of the list now.

I wonder what possessed the Google News team to use dc.date.issued
instead of dc.issued or dcterms.issued.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] date meta-tag invalid

2011-07-18 Thread Henri Sivonen

On Mon, 2011-07-18 at 13:59 +, aykut.sen...@bild.de wrote:
 i have asked one from the seo team and he says for example the freshness
 factor is important for google.

Is there evidence of meta name=date content=... being part of
Google's freshness factor? Is there public documentation explaining what
meta name=date content=... means, what date format expected in the
content attribute is and what software does something useful with it?

 is it possible to use the time-tag in the head instead (i mean invisible)?

No, it's not.

 dc:created is also not in the Meta Extensions List, see:
 http://wiki.whatwg.org/wiki/MetaExtensions

It simply hasn't been registered yet. Is there any evidence of consuming
software that does something useful with dc:created?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Microdata feedback

2011-07-12 Thread Henri Sivonen

On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote:
 The JSON algorithm now ends the crawl when it hits a loop, and replaces 
 the offending duplicate item with the string ERROR.
 
 The RDF algorithm preserves the loops, since doing so is possible with 
 RDF. Turns out the algorithm almost did this already, looks like it was an 
 oversight.

It seems to me that this approach creates an incentive for people who
want to do RDFesque things to publish deliberately non-conforming
microdata content that works the way they want for RDF-based consumers
but breaks for non-RDF consumers. If such content abounds and non-RDF
consumers are forced to support loopiness but extending the JSON
conversion algorithm in ad hoc ways, part of the benefit of microdata
over RDFa (treeness) is destroyed and the benefit of being well-defined
would be destroyed, too, for non-RDF consumption cases.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] Readiness of script-created documents

2011-06-20 Thread Henri Sivonen

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1039

It says complete in Firefox, loading in Chrome and Opera and
uninitialized in IE. The spec requires complete. readyState is
originally an IE API. Why doesn't the spec require uninitialized?

(The implementation in Gecko is so recent that it's quite possible that
Gecko followed the spec and the spec just made stuff up as opposed to
the spec following Gecko.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] Linking to the HTML accessibility API mapping draft

2011-06-16 Thread Henri Sivonen

It's generally safe to assume that the WHATWG spec doesn't suppress
useful information even though W3C publications occasionally might.

However, in the case of
http://dev.w3.org/html5/html-api-map/overview.html 
the WHATWG spec suppresses the document from the Recommended Reading
section. This reduces one's ability to trust that if one reads the
WHATWG spec, information isn't suppressed.

While I realize that the API mapping document isn't even nearly done
yet, is it so incorrect that it's more useful not to let people know
that exists than to link to it alongside the Polyglot guide (which, I
imagine, isn't recommended reading in the sense of recommending that the
Polyglot guide be followed for Web authoring)?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] window.cipher HTML crypto API draft spec

2011-06-14 Thread Henri Sivonen

On Tue, 2011-05-24 at 07:48 -0700, David Dahl wrote:
  Consider for example a DropBox-style service that has a browser-based UI
   but that has a design where content is encrypted on the client-side so
   that the service provider is unable to decrypt the data. In this case,
   it would make sense to be able to implement a file download by having a
   plain a href to an excrypted file and have the browser automatically
   decrypt it. Likewise, a service that allows the transmission of
   encrypted images should be implementable by having img src point
   directly to an encrypted file.
 
 I think someone was asking about that kind of functionality during my 
 presentation at Mozilla. Again, this would be a pretty advanced complement to 
 this API - I would love to see something like that spec'd and implemented as 
 well.

My main worry is that if the two ways of doing crypto don't appear at
the same time for Web authors to use, the Web will shift in an
unfortunately hashbangy direction.

  I suggest adding a Content-Encoding type that tells the HTTP stack that
   the payload of a HTTP response is encrypted and needs to be decrypted
   using a key previously initialized using the JS API.
 
 cool. I'll look into that.

Thanks.

  On the other hand, it seems that letting Web apps generate per-user key
   pairs and letting Web apps discover if the user possesses the private
   key that decrypts a particular message is a privacy problem. Someone who
   wishes to surveil Web users could use private keys as supercookies,
   since the generated private key is most probably going to be unique to
   user.
 
 Currently, my implementation requires the enduser to open a file from the 
 file system in order to view the contents of the private key. It is only 
 accessible to privileged code - content has no access to it whatsoever.

I didn't expect content to have access to the key bits per se. I
expected Web content-provided JS to be able to encrypt and decrypt stuff
with a key it has asked the browser to generate (if the user has
authorized the origin to use the crypto API). The ability to decrypt or
encrypt a message with a particular private key is proof of possession
of that key, so users in possession of a particular key could be
tracked.

This could be mitigated by granting the crypto permissions to a pair of
origins: the origin of the top level frame combined with the origin that
wants to access the API. This way iframed Web bugs could track the user
across sites after having once obtained a crypto permission for their
origin.

See http://www.w3.org/2010/api-privacy-ws/papers/privacy-ws-24.pdf

  Currently, it is unfortunate that choosing to use a webmail client
   effectively prevents a person from using encrypted email. To allow
   people to use end-to-end encrypted email with webmail apps, it would be
   useful to support OpenPGP as an encryption format. (Obviously, a
   malicious webmail app could capture the decrypted messages on the
   browser and send them back to the server, but when the webmail app
   itself doesn't contain code like that, putting the decryption in the
   browser rather than putting it on the server would still probably be
   more subpoena-resistant and resistant against casual snooping by bored
   administrators.)
 
 I think with an API like this we might see a whole new breed of 
 communications applications that can supplant email and webmail entirely. 

Maybe. But Google Wave flopped and email is still here. I think it would
be good to design for the ability to plug into email's network effects
instead of counting on a new breed of communication making email
irrelevant.

  The public key discovery section shows a /meta end tag. I hope this is
  just a plain error and having content in a meta element isn't part of
  any design.
 
 The tag is unimportant as well - can you explain why you hope this wil not 
 use a meta tag?

A meta tag can be used if there's no need for the meta element to have
child nodes. You can't make a meta element have child nodes or an end
tag.

  I could just as easily use addressbookentry

You can't introduce an addressbookentry element as a child of the head
element. The result would not degrade gracefully.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] Please link to a specific fragment id on the microformats.org wiki

2011-06-13 Thread Henri Sivonen

It has been brought to my attention that linking to the microformats.org wiki 
from
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#other-link-types
without a specific fragment id confuses people because the wiki page includes 
stuff that doesn't constitute keyword registrations for HTML(5) purposes.

I believe changing the link to point to 
http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions 
would improve the usability of the registration procedure.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

[whatwg] DOMContentLoaded, load and current document readiness

2011-05-31 Thread Henri Sivonen

Recently, there was discussion about changing media element state in the
same task that fires the event about the state change so that scripts
that probe the state can make non-racy conclusions about whether a
certain event has fired already.

Currently, there seems to be no correct non-racy way to write code that
probes a document to determine if DOMContentLoaded or load has fired and
runs code immediately if the event of interest has fired or adds a
listener to wait for the event if the event hasn't fired.

Are there compat or other reasons why we couldn't or shouldn't make it
so that the same task that fires DOMContentLoaded changes the readyState
to interactive and the same task that fires load changes readyState to
complete?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] CORS requests for image and video elements

2011-05-19 Thread Henri Sivonen

On Tue, 2011-05-17 at 14:25 -0700, Kenneth Russell wrote:
 Unfortunately, experimentation indicates that it is
 not possible to simply send CORS' Origin header with every HTTP GET
 request for images; some servers do not behave properly when this is
 done.

How do they behave? Which servers? Why? Has evangelism been attempted?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Full Screen API Feedback

2011-05-15 Thread Henri Sivonen

On May 13, 2011, at 19:17, Eric Carlson wrote:

 I don't know of exploits in the wild, but I've read about
 proof-of-concept exploits that overwhelmed the user's attention visually
 so that the user didn't notice the Press ESC to exit full screen
 message. This allowed subsequent UI spoofing. (I was unable to find the
 citation for this.)
 
  Maybe you were thinking of this: 
 http://www.bunnyhero.org/2008/05/10/scaring-people-with-fullscreen/.

I'm not sure if that's the exact demo I have seen before, but it uses the same 
idea as the demo I've seen before.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

1 2 3 4 5 6 7 >

1 - 100 of 629 matches

Mail list logo