Re: [whatwg] How to determine content-type of file: protocol

2014-07-28 Thread Gordon P. Hemsley

On 07/28/2014 08:01 AM, duanyao wrote:

On 07/28/2014 06:34, Gordon P. Hemsley wrote:

Sorry for the delay in responding. Your message fell through the
cracks in my e-mail filters.

On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for
files. Does the spec actually want to say provided by the operating
system or
provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the
spec, originally pointed out to me by Boris Zbarsky via IRC many
months ago. The intent here is basically just whatever the computer
says it is—whether that be via the file system, the operating system,
or whatever, and whether it uses magic bytes, file extensions, or
whatever.

In other words, feel free to read that as the correct behavior is
undefined/unknown at this point.

Thanks for the explanation.

Recently, file: protocol becomes more and more important due to the
popularity of packaged web applications, including PhoneGap app, Chrome
app, Firefox OS app, Window 8 HTML app, etc (not all of them use file:
protocol directly, but underlying mechanisms are similar).
So If we can't specify a interoperable way to determine a local file's
mime type, porting of packaged web applications can be problematic in
some situations (actually my team already hit this).

I know that currently there is no standard way to determine a local
file's mime type, this may be one of the reason that mimesniff spec has
not defined a behavior here.


Well, the most basic reason is because I never delved into how it 
actually works, because I was primarily concerned with HTTP connections.


It's possible that there is no interoperable way to determine a local 
file's MIME type, but see below.



I'd like to propose a simple way to resolve this problem:
For mime types that has already been standardized by IANA and used in
web standards, determine a local file's supplied-type according to its
file extension.
This list could include htm, html, xhtml, xml, svg, css, js, ipeg, ipg,
png, mp4, webm, woff, etc. Otherwise, UAs can determine supplied-type by
any means.

I think this rule should resolve most of the interoperability problems,
and largely maintain compatibility with current UAs' implementations.


There is already a standard in place to detect file types on the 
operating system level:


http://www.freedesktop.org/wiki/Specifications/shared-mime-info-spec/
http://cgit.freedesktop.org/xdg/shared-mime-info/

I could just refer to that and be done with it. Do you think that would 
work? (That specification has complex rules for detecting files, 
including magic bytes and whatnot, and is already used on a number of 
Linux distros and probably other operating systems.)



My second question is: does above rule apply equally to both fetching
static resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static
resources, so that .htm and .xhtml files are rendered as HTML and
XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local
files of any type; and if setting xhr.responseType = 'document',
response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always
'application/xml'. This is significantly diverse from static fetching
behavior.

Chromium(34) set Content-Type header to null for local files of any
type; but if setting xhr.responseType = 'document', response is
parsed according to its actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting
xhr.responseType = 'blob', blob.type is the file's actual type, i.e.
'text/html' for .htm and 'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however
Content-Type header is missing.

I think rule 5.1 should be applied to both static fetching and XHR
consistently. Browsers should set Content-Type header to local files'
actual type for XHR, and interpret
them accordingly. But firefox developers think this would break some
existing codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to
make any judgements or claims until I hear his view on the situation.

That being said, I created the Contexts wiki article [1] and began
splitting up the mimesniff spec according to contexts [2] in an effort
to clarify this situation and make sure that all bases were covered.
It's still a work in progress, awaiting feedback from implementers and
other spec writers.

I

Re: [whatwg] How to determine content-type of file: protocol

2014-07-27 Thread Gordon P. Hemsley
Sorry for the delay in responding. Your message fell through the cracks 
in my e-mail filters.


On 07/17/2014 08:26 AM, duanyao wrote:

Hi,

My first question is about a rule in MIME Sniffing specification 
(http://mimesniff.spec.whatwg.org):

5.1 Interpreting the resource metadata
...
If the resource is retrieved directly from the file system, set 
supplied-type to the MIME type
provided by the file system.

As far as I know, no main-stream file systems record MIME type for files. Does the spec 
actually want to say provided by the operating system or
provided by the file name extension?


Yeah, you've hit a known (though apparently unrecorded) bug in the spec, 
originally pointed out to me by Boris Zbarsky via IRC many months ago. 
The intent here is basically just whatever the computer says it 
is—whether that be via the file system, the operating system, or 
whatever, and whether it uses magic bytes, file extensions, or whatever.


In other words, feel free to read that as the correct behavior is 
undefined/unknown at this point.



My second question is: does above rule apply equally to both fetching static 
resources (top level, iframe, img, etc) and XMLHttpRequest?

It seems all browsers try to figure out actual type for local static resources, 
so that .htm and .xhtml files are rendered as HTML and XHTML respectively,
so far so good.

But when it comes to XHR, things are different.

Firefox(31) set Content-Type header to 'application/xml' for local files of any 
type; and if setting xhr.responseType = 'document', response is parsed as XML;
also if setting xhr.responseType = 'blob', blob.type is always 
'application/xml'. This is significantly diverse from static fetching behavior.

Chromium(34) set Content-Type header to null for local files of any type; but 
if setting xhr.responseType = 'document', response is parsed according to its 
actual type,
i.e. .htm as HTML and .xhtml as XHTML; and if setting xhr.responseType = 
'blob', blob.type is the file's actual type, i.e. 'text/html' for .htm and 
'application/xhtml+xml'
for .xhtml. This is similar to static fetching behavior, however Content-Type 
header is missing.

I think rule 5.1 should be applied to both static fetching and XHR 
consistently. Browsers should set Content-Type header to local files' actual 
type for XHR, and interpret
them accordingly. But firefox developers think this would break some existing 
codes that already rely on firefox's behavior
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1037762).

What do you think?

Regards,
 Duan Yao.




Anne's the person to ask about XHR first, I think. I don't want to make 
any judgements or claims until I hear his view on the situation.


That being said, I created the Contexts wiki article [1] and began 
splitting up the mimesniff spec according to contexts [2] in an effort 
to clarify this situation and make sure that all bases were covered. 
It's still a work in progress, awaiting feedback from implementers and 
other spec writers.


I agree that there's a hole in how mimesniff, XHR, and Contexts 
intersect, and I'll be happy to update mimesniff to fill it, if that's 
determined to be the best course of action.


HTH,
Gordon

[1] http://wiki.whatwg.org/wiki/Contexts
[2] http://mimesniff.spec.whatwg.org/#context-specific-sniffing

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] [mimesniff] The Apache workaround should not sniff random types

2014-01-16 Thread Gordon P. Hemsley

On 08/27/2013 12:26 PM, Boris Zbarsky wrote:

The current mimesniff spec says that when the Apache workaround is
applied sniffing should still be able to detect the content as
PostScript, images, videos, archives, audio formats, etc.

I feel that this poses an unacceptable security risk due to allowing
content through firewalls that is then interpreted differently by a UA.
  In particular, postscript and media formats can be used to attack
viewers and decoders.

Web compat does not require this behavior: Gecko only allows
text/plain and application/octet-stream as output types when the
Apache workaround is being applied, and we have been successfully
shipping this for a while.  I would strongly oppose changing the Gecko
behavior here due to the security implications.

Given the security risks and the lack of web compat issues, I believe
the spec should not require the behavior it currently requires.

-Boris


I have finally made this change. Please confirm that this is what you 
had in mind:


https://github.com/whatwg/mimesniff/commit/d7bafc16ee480a5dea4c27d60dd5272388e022ce

http://mimesniff.spec.whatwg.org/#rules-for-text-or-binary

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] [mimesniff] The Apache workaround should not sniff random types

2013-11-16 Thread Gordon P. Hemsley

On 8/27/13 12:26 PM, Boris Zbarsky wrote:

The current mimesniff spec says that when the Apache workaround is
applied sniffing should still be able to detect the content as
PostScript, images, videos, archives, audio formats, etc.

I feel that this poses an unacceptable security risk due to allowing
content through firewalls that is then interpreted differently by a UA.
  In particular, postscript and media formats can be used to attack
viewers and decoders.

Web compat does not require this behavior: Gecko only allows
text/plain and application/octet-stream as output types when the
Apache workaround is being applied, and we have been successfully
shipping this for a while.  I would strongly oppose changing the Gecko
behavior here due to the security implications.

Given the security risks and the lack of web compat issues, I believe
the spec should not require the behavior it currently requires.

-Boris


I'm inclined to agree.

Having heard no objection (or, indeed, any discussion whatsoever) in the 
last 3 months, I plan to move ahead with this proposed change.


Anyone else have anything to say before I do?

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] Zip archives as first-class citizens

2013-08-28 Thread Gordon P. Hemsley

On 8/28/13 9:32 AM, Anne van Kesteren wrote:

We have thought of three approaches for zip URL design thus far:

* Using a sub-scheme (zip) with a zip-path (after !):
zip:http://www.example.org/zip!image.gif
* Introducing a zip-path (after %!): http://www.example.org/zip%!image.gif
* Using media fragments: http://www.example.org/zip#path=image.gif

High-level drawbacks:

* Sub-scheme: requires changing the URL syntax with both sub-scheme
and zip-path.
* Zip-path: requires changing the URL syntax.
* Fragments: fail to work well for URLs relative to a zip archive.

Fragments are conceptually the cleanest as the only part of a URL
that's supposed to depend on the Content-Type is the fragment.
However, if you want to link to an ID inside an HTML resource you'd
have to do #path=test.htmlid=test which would require adding
knowledge to the HTML resource that it is contained in a zip archive
and have special processing based on that. And not just HTML, same
goes for CSS or JavaScript.

I'm not sure we need to consider sub-scheme if zip-path can work as
it's more complex and not very well thought out. E.g. imagine
view-source:zip:http://www.example.org/zip!test.html. (I hope we never
need to standardize view-source and that it can be restricted to the
address bar in browsers.)

zip-path makes zip archive packaging by far the easiest. If we use %!
as separator that would cause a network error in some existing
browsers (due to an illegal %), which means it's extensible there,
though not backwards compatible.

We'd adjust the URL parser to build a zip-path once %! is encountered.
And relative URLs would first look if there's a zip-path and work
against that, and use path otherwise.

Fetching would always use the path. If there's a zip-path and the
returned resource is not a zip archive it would cause a network error.

As for nested zip archives. Andrea suggested we should support this,
but that would require zip-path to be a sequence of paths. I think we
never went to allow relative URLs to escape the top-most zip archive.
But I suppose we could support in a way that

   %!test.zip!test.html

goes one level deeper. And ../image.gif in test.html looks in the
enclosing zip. And ../../image.gif in test.html looks in the
enclosing zip as well because it cannot ever be relative to the path,
only the zip-path.



As the following URLs suggest, the %! (or %-anything) will likely not 
work for ZIP files generated by a script using the query portion of the 
URL, as the path information will be subsumed into the last value 
without causing a network error:


http://whatwg.gphemsley.org/url_test.php?file=test.zipspacer=1%!example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zipspacer=1%/example.png
http://whatwg.gphemsley.org/url_test.php?file=test.zipspacer=1?example.png

(And feel free to use that script to try out any other combos.)

However, since fragments (i.e. anything beginning with '#') are already 
not sent to the server, what if you modified the URL parser to use a 
special hash-prefix combo that indicates the path? Then you could avoid 
the problem of having to make documents aware of the fact that they're 
in a ZIP because the hash-prefix combo would come before the plain hash 
which holds the ID.


So, for example:

http://whatwg.gphemsley.org/url_test.php?file=test.zipspacer=1#/example.html#middle

Then you could also take the opportunity to spec the #! prefix (and 
other hash-combo prefixes) that is used by a lot of sites nowadays.


--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/


Re: [whatwg] [mimesniff] More issues on the MIME Sniffing spec

2013-06-06 Thread Gordon P. Hemsley
On Thu, Jun 6, 2013 at 5:42 AM, Peter Occil pocci...@gmail.com wrote:
 I want to respond to the following issues in the MIME Sniffing spec:

 Resources

 I suggest the following wording for the issue box starting with A resource
 is...

A resource is a data item or message, such as a file or an HTTP response.

 I believe this covers the cases that would normally be associated with a
 MIME type.

I already have an idea about how to define resource.

The reason it's not currently in the spec is because I recall Hixie
expressing some concern about complexity beyond bag of bits and I'm
waiting on feedback from him.

 Contexts

 I don't think the word context needs to be specially defined.  The start
 of section 8
 could be rewritten to remove the definition:

 [[
 In certain cases, it is only useful to identify resources that belong to a
 certain subset of MIME types. In these cases, it is appropriate to use a
 context-specific sniffing algorithm in place of the MIME type sniffing
 algorithm in order to determine the sniffed MIME type of a resource.

 This specification defines the following context-specific sniffing
 algorithms.
 ]]

On the contrary, I think it may be important to define context, as
it is the only lens through which to see fetching and sniffing and the
like.

Currently, the HTML spec only defines (nested) browsing context, so
I put together a wiki page that lists all the other ones that exist
implicitly:

http://wiki.whatwg.org/wiki/Contexts

I plan to rewrite the whole second half of the spec to be in terms of
contexts soon.

 Apache Bug

 As for the Apache bug flag, would it be useful to additionally check the
 HTTP
 headers for a Server header and check if it contains Apache/?  I don't
 know which
 version of Apache the bug involved was fixed in, so I can't suggest a more
 accurate
 string check.

That thought had crossed my mind, but the handling of the situation
mostly predates my editing of the spec, so I haven't given much
thought into whether the current method is the ideal one.

 MP3 Sniffing

 Finally, the Firefox team has recently included a patch to support sniffing
 MP3
 files better [1] and would like to document it and add it to the MIME
 Sniffing
 spec. [2]  The disadvantage, though, is that more than 512 bytes
 are required for an accurate detection.

 --Peter

 [1]: https://bugzilla.mozilla.org/show_bug.cgi?id=862088
 [2]: https://bugzilla.mozilla.org/show_bug.cgi?id=879429


I'm aware of this. I was told that a proposal would be made in due
course, so I'm waiting on that.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review request: Parsing a MIME type

2013-06-01 Thread Gordon P. Hemsley
On Fri, May 31, 2013 at 11:50 PM, Peter Occil pocci...@gmail.com wrote:

 * Another important point to notice is the fact that this algorithm
 allows parameter names to appear without values. This is useful in
 situations such as the base64 option in data: URLs that use the mere
 presence or absence of a parameter to set its boolean value.


 Since you mention data URLs I should note that data URLs can be percent
 encoded, which HTTP
 and MIME headers can't be. This raises additional considerations when
 parsing a data URL's MIME type correctly;
 see reference [1] for test cases.  In particular:

 [1]: http://greenbytes.de/tech/tc/datauri/

This is a very useful resource; thank you for pointing it out to me.

Realize now that that's the only thing that matters: What do the browsers do?

(And percent encoding doesn't matter, as that gets handled before the
parsing begins.)

 * A data URL that begins with data:, or data:;base64, (with no MIME
 type) is assumed to have the MIME type
  text/plain;charset=us-ascii under RFC2397.
 * A data URL that begins with  data:; (with no type or subtype, but with
 parameters) is assumed to have the MIME type
  text/plain under RFC2397.

An empty or invalide MIME type will get treated as unknown and will
eventually be sniffed (if it isn't already). I'll have to consider
what to do with the base64 and other parameters parts, though.

 * The word base64 can only appear at the end of the MIME type, so that a
 data URL like
   data:application/example;base64;foo=bar,AA== will not be encoded in
 base64, strictly speaking. A parameter name (base64 or otherwise)
   cannot otherwise appear without a parameter value.

As I mentioned, strictly speaking doesn't matter, as all browsers do
the same thing, according to the resource you linked: base64
parameters with values are fine; base64 boolean parameters in other
than last place are warnings. (Not sure what the reasoning behind that
distinction is, but that's what reality is.)

So it seems the only issue I have to worry about is what to do with
MIME types which only have parameters.

Regards,
Gordon

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review request: Parsing a MIME type

2013-06-01 Thread Gordon P. Hemsley
On Sat, Jun 1, 2013 at 11:41 AM, Gordon P. Hemsley gphems...@gmail.com wrote:
 On Fri, May 31, 2013 at 11:50 PM, Peter Occil pocci...@gmail.com wrote:
 * The word base64 can only appear at the end of the MIME type, so that a
 data URL like
   data:application/example;base64;foo=bar,AA== will not be encoded in
 base64, strictly speaking. A parameter name (base64 or otherwise)
   cannot otherwise appear without a parameter value.

 As I mentioned, strictly speaking doesn't matter, as all browsers do
 the same thing, according to the resource you linked: base64
 parameters with values are fine; base64 boolean parameters in other
 than last place are warnings. (Not sure what the reasoning behind that
 distinction is, but that's what reality is.)

It seems I read the purpose of the test wrong for base64 parameters
with values: They're fine insofar as they're allowed, but they don't
trigger base64 decoding (except in Safari?), unlike if the boolean
base64 parameter is in a non-last position.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Review request: Parsing a MIME type

2013-05-31 Thread Gordon P. Hemsley
Hello all,

This is a request seeking feedback and review on the MIME Sniffing
algorithm to parse a MIME type:

http://mimesniff.spec.whatwg.org/#parse-a-mime-type

After numerous iterations, I think it is in a state that accurately
reflects the best current practices for interoperability.

As is common with such things, there are numerous points in this
algorithm where implementations do not agree. In general, Firefox and
Chrome tend to pattern together, as do IE and Opera. Safari often
patterns on its own, in favor of a more literal interpretation of the
various RFCs on the matter.

At times, I have had to make a decision as to which was the best
approach. This usually results in half of the implementations being in
violation of the spec; I hope, in those instances, the implementations
in question can be updated to become interoperable with the rest.

With that being said, there are two specific points I want to raise:

(1) The more recent RFCs on the matter restrict type, subtype, and
parameter names to 127 characters. No implementation actually enforces
this limit, but I have included it in the algorithm (relevant points
appear in red) because I think it would be better and safer for both
the user and the user agent to do so.

(2) Based on my analysis of existing implementations, anything that
occurs between the semicolon (and any first whitespace) and the equals
sign is treated as the parameter name, including any whitespace before
the equals sign. However, in order to test parameters, I have been
using 'charset' (because that's they only one I'm aware of that has a
Web-visible effect), and certain implementations may be sniffing
specifically for the string charset=, which would cloud the results
of my testing. Any enlightenment into this issue would be much
appreciated.

I also have a few general points:

* You may notice in the algorithm that I am using hybrid terminology,
sometimes talking about bytes and sometimes talking about characters.
This is mostly because I haven't decided/determined whether to treat a
MIME type as ASCII or as UTF-8. I think there are arguments on both
sides of the issue, but I'm eager to hear your opinions and advice
(especially about how I might phrase the algorithm if it were written
in terms of characters instead of bytes).

* One of the most controversial parts of this algorithm might be the
issue of what to do when a parameter appears more than once. (The RFCs
suggest that the MIME type should be treated as invalid in such a
case, but no implementation actually treats it that way.) I have opted
to make a later appearance of a parameter override and replace an
earlier appearance of a parameter. Modulo caveat (2) above, this is
only done in half the implementations; in particular, IE and Opera
appear to use the first instance of the parameter as the canonical
value.

* Another important point to notice is the fact that this algorithm
allows parameter names to appear without values. This is useful in
situations such as the base64 option in data: URLs that use the mere
presence or absence of a parameter to set its boolean value. Note,
however, that a parameter that has been given an explicit value (even
if that value is the empty string) does not get overridden by the
later appearance of a boolean parameter of the same name.

I think those are the important points of background information you
need to know in order to evaluate this algorithm.

I look forward to your response.

Regards,
Gordon

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] An alternative approach to section 9 of Mime Sniffing

2013-05-25 Thread Gordon P. Hemsley
Section 5 is highlighted with all that red warning stuff precisely
because it is known to be incomplete and insufficient. I haven't yet
decided how I'm going to go about writing that up (and it isn't
inherently obvious that what is there now is bad). So that's not the
best example; and it certainly doesn't have anything to do with
section 9 (at least, not with regard to formatting).

I still don't understand what problem you're trying to solve (and if I
don't understand the problem, I can't come up with a solution). Are
you just having trouble reading and understanding what's there?

MIME Sniffing and WebVTT have very different usecases and, in some
ways, very different audiences. I don't think you can directly compare
the two.

Gordon

On Sat, May 25, 2013 at 1:58 AM, Peter Occil pocci...@gmail.com wrote:
 What I think is that even if an ABNF won't be the normative definition of a
 syntax format, it can help put the format's syntax into a higher-level
 perspective and aid understanding of its syntax: once we understand, for
 example, what the Content-Type header field value ought to contain, in the
 form of an ABNF or in some other way, it will be easier to write processing
 rules for that field value in the spec.  (Right now I'm in the process of
 rewriting section 5 of the MIME sniffing spec.)

 Take the WebVTT spec for example.  For each part of the WebVTT format
 there's a definition of what that part contains in terms of characters, and
 the actual processing rules for parsing that part.  For example, the
 definition for WebVTT cue timings and the algorithm to collect WebVTT cue
 timings and settings. The definition aids understanding of the syntax for
 WebVTT cue timings and informs how the rules for collecting WebVTT cue
 timings are written in the WebVTT spec.


 --Peter

 -Original Message- From: Anne van Kesteren
 Sent: Friday, May 24, 2013 1:28 AM

 To: Peter Occil
 Cc: WHATWG
 Subject: Re: [whatwg] An alternative approach to section 9 of Mime Sniffing

 On Thu, May 23, 2013 at 2:49 PM, Peter Occil pocci...@gmail.com wrote:

 Explain further why you don't recommend ABNF for this case.


 We don't recommend ABNF in general because often ABNF results in a
 mismatch between prescribed and actual processing. E.g. Content-Type
 is defined as an ABNF and technically text/html; does not match that
 ABNF, but everyone (logically) processes that as text/html without
 parameters.

 It's much better to define the actual processing so implementers are
 less inclined to take shortcuts when implementing (test suites also
 help, but they're typically written way-after-the-fact).


 You should also explain whether another change to make section 9 more
 readable is
 appropriate (though it currently is relatively readable as is).


 I'll leave that to Gordon.


 --
 http://annevankesteren.nl/



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Complete MIME type parsing algorithm for section 5

2013-05-25 Thread Gordon P. Hemsley
Peter,

The burden is on you to describe your proposals and what their purpose
and benefit would be.

How does this proposed algorithm differ from what is already in the
spec? How is it better?

Regards,
Gordon

On Sat, May 25, 2013 at 3:58 AM, Peter Occil pocci...@gmail.com wrote:
 I present this draft of the complete algorithm for parsing a MIME type.  I 
 would appreciate comments.

 --Peter

 

 An ASCII alphanumeric is a byte or character in the ranges 0x41-0x5A, 
 0x61-0x7A, and 0x30-0x39.
 A MIME type byte is an ASCII alphanumeric or one of the following bytes: ! # 
 $  ^ _ . + -
 A parameter value byte is a MIME type byte or one of the following bytes: % ' 
 * ` | ~

 To parse a MIME type, run the following steps:

 1. Let length be the length of the byte sequence of the MIME type.
 2. If length is less than 1, return undefined.
 3. Let pointer be 0.  Pointer is a zero-based index to the current byte in 
 the byte sequence.
 4. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 (TAB).
 5. Let type be the byte string from the current byte up to but not including 
 the next / byte. Advance pointer to the next / byte.
 6. If the current byte isn't /, return undefined.
 7. Increment pointer by 1.
 8. Let subtype be the byte string from the current byte up to but not 
 including the next 0x20 (SPACE), 0x09 (TAB), or ; byte.  Advance pointer to 
 the next 0x20 (SPACE), 0x09 (TAB), or ; byte.
 9. If type is empty, contains a byte that isn't a MIME type byte, or doesn't
 begin with an ASCII alphanumeric, or is longer than 127 bytes, return 
 undefined.
 10. If subtype is empty, contains a byte that isn't a MIME type byte, or 
 doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, return 
 undefined.
 11. Convert type and subtype to ASCII lowercase.
 12. Let parameters be an empty dictionary.
 13. Run the following substeps in a loop.
  1. Advance pointer to the next byte other than 0x20 (SPACE) or 0x09 
 (TAB).
  2. If pointer is equal to length, return type, subtype, and parameters.
  3. If the current byte isn't ;, return undefined.
  4. Increment pointer by 1.
  5. If pointer is equal to length, return type, subtype, and parameters.
  6. Let parameter be the byte string from the current byte up to but not 
 including the next = byte. Advance pointer to the next = byte.
  7. If parameter is empty, contains a byte that isn't a MIME type byte, 
 or doesn't begin with an ASCII alphanumeric, or is longer than 127 bytes, 
 return undefined.
  8. If parameters contains a mapping for parameter, return undefined.
  9. Convert parameter to ASCII lowercase.
  10. If the current byte isn't =, return undefined.
  11. Increment pointer by 1.
  12. If the current byte equals 0x22 (quotation mark), run the following 
 substeps:
   1. Let value be an empty byte string.
   2. Increment pointer by 1.
   3. Run these substeps in a loop.
   1. If pointer is equal to length, return type, subtype, 
 and parameters.
   2. If the current byte equals 0x7F or is less than 
 0x20, and the current byte isn't TAB (0x09), return type, subtype, and 
 parameters.
   3. If the current byte equals 0x22 (quotation mark), 
 increment pointer by 1 and terminate this loop.
   4. Otherwise, if the current byte is \, increment 
 pointer by 1. Then, if there is a current byte, append that byte to value.
   5. Otherwise, append the current byte to value.
   6. Increment pointer by 1.
   4. Add the mapping of parameter to value to the parameters 
 dictionary.
  13. Otherwise, run these substeps:
   1. Let value be the byte string from the current byte up to but 
 not including the next 0x20 (SPACE), 0x09 (TAB), or ; byte.  Advance 
 pointer to the next 0x20 (SPACE), 0x09 (TAB), or ; byte.
   2. If value is empty or contains a byte that isn't a parameter 
 value byte, return undefined.
   3. Add the mapping of parameter to value to the parameters 
 dictionary.

 ---





-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Complete MIME type parsing algorithm for section 5

2013-05-25 Thread Gordon P. Hemsley
On Sat, May 25, 2013 at 12:46 PM, Peter Occil pocci...@gmail.com wrote:
 My algorithm skips only SPACE and TAB instead of all whitespace characters
 because it assumes that the field value was already extracted from
 Content-Type according to the HTTP/HTTPbis spec (0x0C, form feed, is never
 considered whitespace in HTTP headers). In particular, it assumes that
 folding whitespace (obs-fold) was replaced with spaces (or the message with
 obs-fold rejected) before the Content-Type value was interpreted.

Thanks for your detailed explanation.

It'll take me a little while to evaluate what you've proposed here,
but in the meantime: Keep in mind that the Content-Type header is not
the only source for a MIME type. This algorithm needs to consider MIME
types from all possible sources.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] An alternative approach to section 9 of Mime Sniffing

2013-05-23 Thread Gordon P. Hemsley
The pattern matching algorithm is used because certain patterns
require other-than-exact matching. That is why the pattern mask
exists. This is particularly important for the rules for identifying
an unknown MIME type (defined in 10.1), which matches ASCII
characters case-insensitively; it is also important for a number of
patterns that contain unimportant bytes that should be ignored (like
WebP, in your example).

The algorithm lays out the information in tabular form because that
makes clearer the separation between the important bytes and the
unimportant (or case-insensitive) bytes. Keep in mind that
implementations may read one byte at a time; using ABNF would give
them no benefit, and would likely make things more confusing.

I wonder: What problem are you trying to solve with this proposal?

(In the future, please add [mimesniff] to the beginning of your
subject line for MIME Sniffing discussions; this will ensure that I
see them and pay attention to them more quickly.)

Regards,
Gordon

On Thu, May 23, 2013 at 2:10 AM, Peter Occil pocci...@gmail.com wrote:
 I propose rewriting section 9 and parts of section 10 in a different way, to 
 use the ABNF format in RFC 5234. (Note that ABNFs are already  used in the 
 current Fetch specification.) With this approach, the definitions for byte 
 pattern,  pattern mask, and the pattern matching algorithm can be 
 eliminated (all of which are found before section 9.1).

 An example for the image pattern matching algorithm is given below.

 ---

 9.1  Matching an image type pattern

 The image pattern matching algorithm takes a byte sequence as input.  The 
 algorithm goes through the following image types in the order given.  For 
 each image MIME type given below, if the start of the byte sequence matches 
 its ABNF, return the concatenation of image/ and the name of the ABNF (in 
 lowercase), and terminate the image pattern matching algorithm.

 vnd.microsoft.icon = %x00.00.01.00
; A Windows Icon signature.
 bmp = %x42.4D
; The string BM, a BMP signature.
 gif = %x47.49.46.38 (%x37 / %x39) %x61
; The string GIF87a or GIF89a, a GIF signature.
 webp = %x52.49.46.46 4OCTET %57.45.42.50.56.50
; The string RIFF followed by four bytes followed by the string WEBPVP.
 png = %x89.50.4E.47.0D.0A.1A.0A
; The byte 0x89 followed by the string PNG
; followed by CR LF SUB LF, the PNG signature.
 jpeg = %xFF.D8.FF
; The JPEG Start of Image marker followed by the indicator
; byte of another marker.

 If the start of the byte sequence doesn't match any ABNF given above, return 
 undefined.

 ---

 I would appreciate comments.

 --Peter



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between a download and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Tue, May 7, 2013 at 10:18 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 5/7/13 5:54 PM, Gordon P. Hemsley wrote:

 A @download attribute with a value would override both factors, like so:
 (1) Download it.
 (2) A.txt

 Why?

 You say this as if it were obvious, but it's not obvious to me at all...
 What's the reasoning that makes this the desirable behavior?

It's not clear to me which of the two factors you take issue with.

Here's what the spec says:

The download attribute, if present, indicates that the author intends
the hyperlink to be used for downloading a resource. The attribute may
have a value; the value, if any, specifies the default file name that
the author recommends for use in labeling the resource in a local file
system.

I interpret that first sentence to mean that the file should be
downloaded (disposition type = attachment) rather than displayed
(disposition type = inline). The second sentence very clearly suggests
that A.txt would be the filename presented to the user by default in
the save dialog.

 I don't see what the security concerns might be: There is no
 difference here than what is already available

 There is if you allow cross-origin @download.

 There is if you allow untrusted markup on your server and don't sanitize
 away @download (should it be sanitized away?  Unclear).

I'm still not seeing what the problem is. All this does is make the
browser treat the link as if the user followed it and then went File 
Save Page As

What are the security concerns, cross-origin or otherwise?

 AFAICT, there are no content
 sniffing or cross-domain issues at play.

 But there are; see above.

Well, what I should have said is, there is no content sniffing beyond
what is already done for regular page saves. (The UI can show the MIME
type or format of the file in the download box, as it would for any
file it doesn't handle natively.)

 results when saving a file; they don't do any file extension vs. file
 format checking.

 Uh... that depends on exactly how you save and your OS.  Browsers commonly
 do file extension vs MIME type checking on Windows.  Behavior on other OSes
 varies, and varies across browsers.

 -Boris

Ah, I admit, I'm a bit biased towards Mac in that regard. It's been a
while since I used Windows. But I'd be surprised to find out that the
browser (Firefox, in the case I have in mind) changes the extension in
the suggested filename (e.g. example.php for an HTML file) on
Windows but not on Mac, and I would argue that that perhaps should not
be the case.

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between a download and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Wed, May 8, 2013 at 9:43 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 5/8/13 6:53 AM, Gordon P. Hemsley wrote:

 It's not clear to me which of the two factors you take issue with.


 The question of which filename takes priority.


 The second sentence very clearly suggests
 that A.txt would be the filename presented to the user by default in
 the save dialog.


 No, it suggests that A.txt is what the page author recommends.

 If, at the same time, B.txt is what the server author recommends, what
 should happen?

I still think @download takes priority.

The Content-Disposition header says, Nevermind what filename the URL
shows; this is really file B.txt.

The @download attribute says, Nevermind what filename this link would
normally be; let's just consider it A.txt.

 There is if you allow cross-origin @download.

 There is if you allow untrusted markup on your server and don't sanitize
 away @download (should it be sanitized away?  Unclear).


 I'm still not seeing what the problem is. All this does is make the
 browser treat the link as if the user followed it and then went File 
 Save Page As


 No, because in that case the browser will definitely use the
 Content-Disposition filename, not the one from @download.

OK, technically, the way I phrased it, yes. But what I meant was that
it rolls a bunch of steps into one, telling the browser that the link
should be downloaded and named per suggestion.

 What are the security concerns, cross-origin or otherwise?


 One concern is being able to do this:

   a download=known-location.pdf
  href=http://some-bank/statement.pdf;

 cross-site and combining it with something that lets you read
 known-location.pdf (e.g. a file://-specific privacy hole that only applies
 to some filenames, or an input type=file that the user has already filled
 in).

That seems like quite a sophisticated attack that relies on a lot of
things falling into place all at once. I'm not sure that should block
the use of the attribute in and of itself.

 Another concern is if you upload a file to an image-sharing site, but it
 happens to be a Windows executable.  Then you link to it with:

   a download=something.exe href=http://image-sharing-site/whatever;

 and wait for the user to download and double-click.  This relies on the user
 thinking the file came from image-sharing-site so must be an image.  UAs may
 do mitigations here by changing the suggested filename, of course.

Then I think it is the responsibility of the UA to sniff the file and
protect the user from such attempts to mislead.

At the very least, the download UI could specify the actual type of
the file that is being downloaded. (More on how to protect users who
don't read that below.)

 Generally, allowing this sort of thing opens up several new phishing nd
 social engineering attack vectors, and it's not clear that we want that.

There is a price to freedom, as they say. We shouldn't let a few
rotten apples spoil the whole bunch.

 Well, what I should have said is, there is no content sniffing beyond
 what is already done for regular page saves. (The UI can show the MIME
 type or format of the file in the download box, as it would for any
 file it doesn't handle natively.)


 It can, and users routinely ignore that.


 Ah, I admit, I'm a bit biased towards Mac in that regard. It's been a
 while since I used Windows. But I'd be surprised to find out that the
 browser (Firefox, in the case I have in mind) changes the extension in
 the suggested filename (e.g. example.php for an HTML file) on
 Windows but not on Mac


 It sure used to in some cases, partially in concert with the Windows
 filepicker.  See the (scant) documention for lpstrDefExt at
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms646839%28v=vs.85%29.aspx
 and I suggest actually doing some experimentation across the different save
 variants (save image, save link as, save page as, click on something with
 content-disposition:attacment) on several OSes to see the behavior.  There
 is certainly a good bit of code in the various file-saving codepaths in
 Firefox that attempts to ensure extensions match MIME types, to forbid
 saving things with certain extensions, etc.

 Also note that Chrome will change extensions on at least @download filenames
 to match the MIME type; I haven't experimented in detail with its behavior
 for other cases.  And I haven't experimented much with other browsers in
 this area, though I expect all have some interesting behavior.

 -Boris

I'm not sure I have the resources to do extensive real-world testing
of this (and that documentation suggests it has been superseded in
more modern OSes), but I don't think it would be unreasonable for the
UA to override or augment the filename suggested by the @download
attribute it if determines that it would not be in the best interest
of the user to use the suggested filename unchanged. Note that the
spec also says: There are no restrictions on allowed values, but
authors

Re: [whatwg] Priority between a download and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Wed, May 8, 2013 at 12:01 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 5/8/13 10:45 AM, Gordon P. Hemsley wrote:

 I still think @download takes priority.

 The Content-Disposition header says, Nevermind what filename the URL
 shows; this is really file B.txt.

 The @download attribute says, Nevermind what filename this link would
 normally be; let's just consider it A.txt.


 OK, that's at least a reasonable argument for the behavior.  ;)


 That seems like quite a sophisticated attack that relies on a lot of
 things falling into place all at once.


 Uh... yes.  Like most browser exploits.

Perhaps. But maybe I'm not clear on what exactly the alternate
proposal is. Are you suggesting not supporting the @download
attribute? Or just ignoring it when Content-Disposition specifies a
filename? (I would suggest that neither is the appropriate response.)

 Then I think it is the responsibility of the UA to sniff the file and
 protect the user from such attempts to mislead.


 This is not trivial, since sniffing can easily fail on files that are both
 HTML and png or both HTML and exe at the same time.  There's a good bit of
 research on things like this.

Yes, and that research has already gone into creating the mimesniff
standard, has it not? I'm suggesting use the existing algoirthm(s) in
an additional arena, not creating a new, separate algorithm.

If a file from an image sharing site is served as (or determined to
be, via the sniffing algorithms) image/png, for example, then the UA
should suggest a filename with a .png extension, ignoring any
suggestion by the author for a .exe extension. (Whether you want to
change it to A.png or A.exe.png is debatable, I suppose.)

 I'm not sure I have the resources to do extensive real-world testing
 of this (and that documentation suggests it has been superseded in
 more modern OSes), but I don't think it would be unreasonable for the
 UA to override or augment the filename suggested by the @download
 attribute it if determines that it would not be in the best interest
 of the user to use the suggested filename unchanged.


 Phrased that way, using the Content-Disposition filename is a perfectly
 valid override if not in the best interest of the user behavior, fwiw.

 -Boris


True. But doesn't that imply a rejection of my aforementioned
reasonable argument?

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between a download and content-disposition

2013-05-08 Thread Gordon P. Hemsley
On Wed, May 8, 2013 at 12:21 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 5/8/13 12:15 PM, Gordon P. Hemsley wrote:

 Perhaps. But maybe I'm not clear on what exactly the alternate
 proposal is. Are you suggesting not supporting the @download
 attribute? Or just ignoring it when Content-Disposition specifies a
 filename? (I would suggest that neither is the appropriate response.)


 What Gecko implements right now is:

 1)  @download is ignored for non-same-origin links.
 2)  If Content-Disposition specifies a filename, that filename is used
 no matter what @download says.

I understand now the motivation for this, but I would think that it
would remove a lot of the usefulness of the @download attribute: If
you have the same origin, you probably already have access to (a) name
the file appropriately in the first place, or (b) set the
Content-Disposition header to send the appropriate filename. No?

 This is not trivial, since sniffing can easily fail on files that are
 both
 HTML and png or both HTML and exe at the same time.  There's a good bit
 of
 research on things like this.


 Yes, and that research has already gone into creating the mimesniff
 standard, has it not? I'm suggesting use the existing algoirthm(s) in
 an additional arena, not creating a new, separate algorithm.


 The mimesniff standard doesn't try to sniff for types UAs don't render
 natively, which is what would be needed here.

I'm not so sure about that, but I'll leave it to someone else to
argue. (If you determine a file to be a PNG, then you suggest a .png
extension, regardless of whether there might be an embedded
executable; if you don't support the file format, then how do you know
that it isn't supposed to be an executable in the first place? —and
what is it doing on the Web?)

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] HTML differences from HTML4 document updated

2013-05-07 Thread Gordon P. Hemsley
Simon,

I think it would be good to consider the target audiences, of which
there are probably many:

You have the audience who is worried that HTML5 is some grand
departure from the HTML 4.01 they (think they) know and love. For
them, you'll want to describe what exactly has been removed and why,
instilling the idea of a separation between semantic and
presentational markup.

Then you have the audience that is excited to see what they can do now
with HTML5 that they couldn't do with HTML 4.01. For them, you'd list
the new elements and attributes and such.

Then you probably have some other incidentals such as things that were
removed or changed just because they were never implemented or people
never used them. These probably don't fall into either of the two
categories above.

But you also have another issue to consider: For this document, the
difference between the W3C's concept of specification snapshots and
WHATWG's concept of a living standard is not trivial. For the former,
you can have snapshot documents detailing the differences between each
snapshot specification; for the latter, you need a living document
that is anchored by a fixed point at one end (HTML 4.01).

This raises the question of the purpose of this document: Is it to
simplify the transition from HTML 4.01 to HTML5+? Or is it to act as
an HTML changelog from here on out? Because I think attempting to do
both within a single document will become unwieldy as time goes on.

Regards,
Gordon


On Tue, May 7, 2013 at 5:00 AM, Simon Pieters sim...@opera.com wrote:
 On Mon, 06 May 2013 16:50:03 +0200, Jukka K. Korpela jkorp...@cs.tut.fi
 wrote:

 I don't think this is of particular importance.


 If it isn't, why not use the correct spelling?


 Mostly to be consistent with HTML5.


 When referring to specifications, it is usually a good idea to use their
 own spelling, even when it is odd and confusing.

 HTML 4.01 is intended. The differences between revisions of HTML4 is out
 of scope.


 Then the heading should say HTML 4.01.


 It's longer, and it's not clear to me that people are actually confused
 about what HTML4 refers to.


 Modern HTML differences from HTML4? I'm not convinced that's a win.
 Near-future seems wrong since it's more like current.


 The difficulty here directly reflects the vague nature of HTML5: it partly
 tries to describe HTML as actually implemented and partly specifies features
 that should (or shall) be implemented. Hence it is both modern and
 (intended to be) near-future.

 But the fundamental difficulty is that you are trying to describe a
 specific version, or set of versions, of HTML without giving it a proper
 name or version number.

 Since WHATWG does not use a proper name for its version (the title is just
 HTML), I think the only way to refer to it properly is to prefix it with
 WHATWG. This would lead to the title

 Differences of HTML5 and WHATWG HTML from HTML 4.01


 Here HTML5 is supposed to refer to W3C HTML5 and W3C HTML5.1?

 How about I go back to the original title Differences from HTML4?
 http://wiki.whatwg.org/wiki/Differences_from_HTML4



 Such a document would be useful, but it's not this document. The primary
 focus for this document is what is different from HTML4.


 But why? What is the purpose of this document? This is relevant to naming
 it, and to the content too, of course. Now it is neither a reliable
 comparison with links the relevant clauses nor an overview - it has too many
 details, to begin with.


 It's more intended to be an overview. Can you give an example of something
 that is too detailed and suggest the level of detail that would be more
 appropriate?


 Is this for authors who consider moving from HTML 4.01 to HTML 5?


 Yes.


 Then I think it should primarily specify what HTML 4.01 features are
 forbidden in HTML 5, then the extensions.


 Thanks, that's useful feedback.


 --
 Simon Pieters
 Opera Software



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Priority between a download and content-disposition

2013-05-07 Thread Gordon P. Hemsley
I realize this is an old thread, so apologies if this has already been
resolved. The discussion that originally followed seemed to have
gotten off track, so I wanted to try to clarify things.

First off, there are two factors to consider:
(1) Whether to download the file or display it.
(2) What filename to suggest for the file when it is downloaded.

In the general case, with a normal a href and no Content-Disposition
header (or the plain 'Content-Disposition: inline' header, listed as
(1) originally), the answers are:
(1) Display it.
(2) Whatever the filename on the server is (e.g. page.txt or
example.php), modulo OS restrictions.

In the case of a normal a href and a 'Content-Disposition: inline;
filename=B.txt' header (listed as (2) originally), the answers are:
(1) Display it.
(2) B.txt

Changing the disposition type doesn't change much, with a normal a
href and a 'Content-Disposition: attachment; filename=B.txt' header
(listed as (3) originally):
(1) Download it.
(2) B.txt

So now, the question is, what effect does a @download attribute have?
Nothing too surprising.

An empty @download attribute would override the 1st factors above so
that they are always Download it.

A @download attribute with a value would override both factors, like so:
(1) Download it.
(2) A.txt

Thus, the @download attribute acts to override the Content-Disposition
header, giving the following hierarchy:


@download  Content-Disposition  URL


Or, in pseudocode (with the assumption that if X has Y, then X is also present):


disposition_type = ( @download is present ) ? attachment : ( (
Content-Disposition header is present ) ? Content-Disposition
disposition type : inline );
suggested_filename = ( @download has a value ) ? value of @download :
( ( Content-Disposition has filename parameter ) ? Content-Disposition
filename value : filename from URL );


I don't see what the security concerns might be: There is no
difference here than what is already available, except that there's
now an additional way to specify it. AFAICT, there are no content
sniffing or cross-domain issues at play. Browsers already give strange
results when saving a file; they don't do any file extension vs. file
format checking. (For example, the output of a .php or .cgi or .py
file on a server is usually HTML, yet browsers don't generally make
any attempt to change the file extension to .html when saving the
file, IME.)

Does this make sense? Am I missing anything?

Regards,
Gordon


On Sat, Mar 16, 2013 at 9:49 PM, Jonas Sicking jo...@sicking.cc wrote:
 It's currently unclear what to do if a page contains markup like a
 href=page.txt download=A.txt if the resource at audio.wav
 responds with either

 1) Content-Disposition: inline
 2) Content-Disposition: inline; filename=B.txt
 3) Content-Disposition: attachment; filename=B.txt

 People generally seem to have a harder time with getting header data
 right, than getting markup right, and so I think that in all cases we
 should display the save as dialog (or display equivalent download
 UI) and suggest the filename A.txt.

 The spec is currently defining something else at least for 3.

 Potentially there are reasons to do something different in the case
 when the linked resource lives off of a different origin since in that
 case there might be security reasons to use the filename or
 disposition of the server that is actually serving up the content.
 However I don't think we can expect people to indicate
 Content-Disposition: inline in order to protect resources. Nor do I
 think that simply using a different filename is going to meaningfully
 protect downloaded content. So I think a stronger UI warning is needed
 in this scenario.

 Firefox currently doesn't support cross-origin @download references,
 so I don't have any meaningful implementation experience to share
 regarding that scenario.

 / Jonas



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] HTML differences from HTML4 document updated

2013-05-03 Thread Gordon P. Hemsley
The way I interpreted it, Jukka meant that the title could be
something more flowing, like Differences between HTML4 and HTML(5).

Gordon

On Fri, May 3, 2013 at 2:10 PM, Xaxio Brandish xaxiobrand...@gmail.com wrote:
 Good day,

 Let us start with a definition:

 es·o·ter·ic
 /ˌesəˈterik/
 Adjective
 Intended for or likely to be understood by only a small number of people
 with a specialized knowledge or interest.

 The document Simon delivered and formatted is useful to a wide range of
 audiences interested in HTML and how it differs from a previous named
 release of the HTML roadmap, so I'm not sure calling the title of the
 document esoteric is accurate.

 Regardless of that, if the title is obscure, could you please offer up
 title suggestions so that your posting becomes more constructive?  Keep in
 mind that an existing document [1] on the whatwg.org site references HTML
 version 4 as HTML4 already, so there is a precedent set for this.  I do
 not think this will confuse anybody, and it would have to be changed
 throughout documents on the entire site to be consistent.  I'd like to
 propose that both nomenclatures are valid when referring to the entire HTML
 4 specification.

 The important thing (IMHO) to remember here regarding the title is that
 HTML released two subversions of HTML 4, HTML 4.0 [2] and HTML 4.01 [3].
 The document must be intended as a differentiation between the entire
 version of HTML4, since it does not specify a specific subversion to diff?
 However, it links to the HTML 4.01 specification in the References
 section.  If this is *only* a diff between HTML 4.01 and the living
 standard, perhaps the title should then be HTML differences from HTML
 4.01 so that the document has additional meaning.  If there are
 differences between HTML 4.0, HTML 4.01, *and* HTML5 in the same section of
 the document, those should probably be appropriately marked.

 --Xaxio

 References:
 [1]
 http://www.whatwg.org/specs/web-apps/current-work/multipage/introduction.html#history-1
 [2] http://www.w3.org/TR/1998/REC-html40-19980424/
 [3] http://www.w3.org/TR/REC-html40/


 On Fri, May 3, 2013 at 9:20 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:

 2013-05-03 18:37, Simon Pieters wrote:

  The past few days I've been working on updating the HTML differences
 from HTML4 document, which is a deliverable of the W3C HTML WG but is
 now also available as a version with the WHATWG style sheet:

 http://html-differences.**whatwg.org/http://html-differences.whatwg.org/


 I think you should start from making the title sensible. HTML differences
 from HTML4 is too esoteric even in this context.

 Think about a heading FOO differences from FOO9. Wouldn't you say that
 some FOOist is writing very obscurely?

 Besides, the spelling is HTML 4. Especially if you think HTML 4 is
 ancient history, retain the historical spelling.

 Yucca






-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] HTML differences from HTML4 document updated

2013-05-03 Thread Gordon P. Hemsley
It is my understanding that the W3C version lists HTML5 and the
WHATWG version uses HTML. That was what I intended by HTML(5). I
didn't mean the parentheses were included literally.

Gordon

On Fri, May 3, 2013 at 2:19 PM, Xaxio Brandish xaxiobrand...@gmail.com wrote:
 Ah.  The document scope [1] explains why it uses HTML in the title as
 opposed to HTML5 or HTML(5).

 --Xaxio

 References:
 [1] http://html-differences.whatwg.org/#scope



 On Fri, May 3, 2013 at 11:16 AM, Gordon P. Hemsley gphems...@gmail.com
 wrote:

 The way I interpreted it, Jukka meant that the title could be
 something more flowing, like Differences between HTML4 and HTML(5).

 Gordon

 On Fri, May 3, 2013 at 2:10 PM, Xaxio Brandish xaxiobrand...@gmail.com
 wrote:
  Good day,
 
  Let us start with a definition:
 
  es·o·ter·ic
  /ˌesəˈterik/
  Adjective
  Intended for or likely to be understood by only a small number of people
  with a specialized knowledge or interest.
 
  The document Simon delivered and formatted is useful to a wide range of
  audiences interested in HTML and how it differs from a previous named
  release of the HTML roadmap, so I'm not sure calling the title of the
  document esoteric is accurate.
 
  Regardless of that, if the title is obscure, could you please offer up
  title suggestions so that your posting becomes more constructive?  Keep
  in
  mind that an existing document [1] on the whatwg.org site references
  HTML
  version 4 as HTML4 already, so there is a precedent set for this.  I
  do
  not think this will confuse anybody, and it would have to be changed
  throughout documents on the entire site to be consistent.  I'd like to
  propose that both nomenclatures are valid when referring to the entire
  HTML
  4 specification.
 
  The important thing (IMHO) to remember here regarding the title is that
  HTML released two subversions of HTML 4, HTML 4.0 [2] and HTML 4.01 [3].
  The document must be intended as a differentiation between the entire
  version of HTML4, since it does not specify a specific subversion to
  diff?
  However, it links to the HTML 4.01 specification in the References
  section.  If this is *only* a diff between HTML 4.01 and the living
  standard, perhaps the title should then be HTML differences from HTML
  4.01 so that the document has additional meaning.  If there are
  differences between HTML 4.0, HTML 4.01, *and* HTML5 in the same section
  of
  the document, those should probably be appropriately marked.
 
  --Xaxio
 
  References:
  [1]
 
  http://www.whatwg.org/specs/web-apps/current-work/multipage/introduction.html#history-1
  [2] http://www.w3.org/TR/1998/REC-html40-19980424/
  [3] http://www.w3.org/TR/REC-html40/
 
 
  On Fri, May 3, 2013 at 9:20 AM, Jukka K. Korpela jkorp...@cs.tut.fi
  wrote:
 
  2013-05-03 18:37, Simon Pieters wrote:
 
   The past few days I've been working on updating the HTML differences
  from HTML4 document, which is a deliverable of the W3C HTML WG but is
  now also available as a version with the WHATWG style sheet:
 
 
  http://html-differences.**whatwg.org/http://html-differences.whatwg.org/
 
 
  I think you should start from making the title sensible. HTML
  differences
  from HTML4 is too esoteric even in this context.
 
  Think about a heading FOO differences from FOO9. Wouldn't you say
  that
  some FOOist is writing very obscurely?
 
  Besides, the spelling is HTML 4. Especially if you think HTML 4 is
  ancient history, retain the historical spelling.
 
  Yucca
 
 
 



 --
 Gordon P. Hemsley
 m...@gphemsley.org
 http://gphemsley.org/ • http://gphemsley.org/blog/





-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] use of article to markup comments

2013-01-28 Thread Gordon P. Hemsley
List elements and sectioning elements both represent hierarchical
relationships. They differ in how they emphasize that relationship:
lists emphasize the hierarchy outside the content, while sectioning
emphasizes the hierarchy within the content.

If the question is specifically about how to mark up comments on a
blog post or something, there's no reason you can't combine the two
methods: Each comment is a self-contained article, with
relationships between comments represented by ol.

One example:
http://jsbin.com/edewoy/1

That example presumes you consider blog post comments (or replies to
comments) as a section within the content that is being commented on
(or replied to). You could also modify the markup to have two
articles (one for the blog post and one for the comments) packaged
within a single parent article, but the principle is the same.

Note that the key here is that there is no restriction on combining
lists and sectioning elements, and thereby no need to modify the
semantics of ol or ul (as proposed in [2] in the root message).

Gordon

On Mon, Jan 28, 2013 at 12:13 PM, Steve Faulkner
faulkner.st...@gmail.com wrote:
 Brucel wrote:

 On Sat, 26 Jan 2013 10:56:10 -, Steve Faulkner
 faulkner.st...@gmail.com wrote:


  Lists are appropriate for indicating nested tree structures. The use
  of lists to markup comments is a common mark up pattern used in
  blogging software such as wordpress. The code verbosity is not
  dissimilar to  the use of article, less so even option end /li tags
  are omitted. Besides comments are generated code not hand authored so
  I don't see a problem with code verbosity

 [...]

 
  (It makes some sense, I suppose, to think of comments as a list, but
  *unordered*? If you're going to group them at all, wouldn't the order
  be important? Bruce Lawson (
  http://lists.w3.org/Archives/Public/public-html/2013Jan/0111.html)'s
  observation that comments are heavily dependent on context would seem
  to support the idea that it *is* important, especially since some
  comments are responses to others.)
 
  agreed it would be better to use order lists.
 

   Wordpress blogs, for example, have comments like

 Bob Smith said at a href=#permalink9.55 on 31 Febtember/a: LOL

 Thus, every comment has a link that a UA can use to jump from comment to
 comment. The order is implied via the timestamp. So what's wrong with

 article
 h1Witty blogpost/h1
 plorem ipsum

 section
 h235 erudite and well-reasoned comments/h2
 divBob Smith said at a href=#permalink19.55 on 31 Febtember/a: Can
 I use DRM in Polyglot documents?/div
 divHixie said at a href=#permalink29.57 on 1 June/a: What's your
 use case?/div
 ...
 /section

 /article

 In short, why should the spec suggest any specific method of marking up
 comments?

 Good question, in the case of article recommended tomarkup comments
 it seems like it's an element in search of a use case.

 For users who consume article semantics it appear to cause issues when
 used for any piece of content ranging from a one sentence comment to
 an article containing thousands of words or an interactive widget.


 regards
 SteveF



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Sniffing archives

2012-12-05 Thread Gordon P. Hemsley
(It seems I somehow managed to not send this to the list the first
time around. Addendum included.)

On Tue, Dec 4, 2012 at 2:40 AM, Adam Barth w...@adambarth.com wrote:
 On Mon, Dec 3, 2012 at 12:39 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2012-11-29 20:25, Adam Barth wrote:
 These are supported in Chrome.  That's what causes the download.  From

 Can you elaborate about what you mean by supported? Chrome sniffs for the
 type, and then offers to download as a result of that sniffing? How is that
 different from not sniffing in the first place?

 They might otherwise be treated as a type that can be displayed
 (rather than downloaded).

But isn't the whole point of the spec to eliminate such accidental
sniffing? Anything not explicitly sniffed based on the first bytes of
the file will be assumed to be either 'application/octet-stream' or
'text/plain', depending on whether there are binary bytes present.

The old IE behavior that you were investigating in your 2009 paper,
where you sniff beyond the first few bytes to find embedded HTML, is
eliminated with this sniffing algorithm. There is no case where you
would accidentally sniff something as scriptable, if you were
following the algorithm correctly.

Or am I missing something?

P.S.

Note also that I have previously defined what it means to be
supported by the user agent:

A valid media type is supported by the user agent if the user agent
has the capability to interpret a resource of that media type and
present it to the user.

http://mimesniff.spec.whatwg.org/#supported-by-the-user-agent

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Sniffing archives

2012-12-04 Thread Gordon P. Hemsley
On Tue, Dec 4, 2012 at 11:07 AM, Adam Barth w...@adambarth.com wrote:
 On Mon, Dec 3, 2012 at 11:59 PM, Julian Reschke julian.resc...@gmx.de wrote:
 On 2012-12-04 08:40, Adam Barth wrote:
 They might otherwise be treated as a type that can be displayed
 (rather than downloaded).  Also, some user agents treat downloads of

 Do you have an example for that case?

 ZIP archives differently than other sorts of download (e.g., they
 might offer to unzip them).

 Out of curiosity: which?

 Safari.

 Adam

To be more specific:

(1) Safari doesn't appear to prompt the user for any downloads. It
just automatically downloads any file it can't handle.
(2) If you allow Safari to open safe files that it downloads, ZIP
appears to be one of them. Gzip and RAR, however, do not.

So this isn't the most convincing argument.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 2:32 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 11/29/12 2:07 AM, Gordon P. Hemsley wrote:

 So perhaps a more useful question would be what to do in situations
 like that—should mimesniff treat application/octet-stream as a type
 supported by the browser for the purposes of sniffing images, audio
 or video, fonts, or other media types?


 The way it works right now is that
 http://www.whatwg.org/specs/web-apps/current-work/#mime-types says:

   The MIME type application/octet-stream with no parameters is never
   a type that the user agent knows it cannot render. User agents must
   treat that type as equivalent to the lack of any explicit
   Content-Type metadata when it is used to label a potential media
   resource.

 So for the purpose of sniffing media loads specifically, that type is
 treated just like no type at all.

 But first you have to know it's a media load.

Oh, this is probably the location where the HTML spec doesn't
currently, but eventually should, reference the rules for sniffing
audio and video specifically in mimesniff. (Is this where Opera
implements such rules?)

Is it just me (and my late-night reading), or is that section
contradictory on how to treat application/octet-stream?

At one point it says, The MIME type application/octet-stream with
no parameters is never a type that the user agent knows it cannot
render. User agents must treat that type as equivalent to the lack of
any explicit Content-Type metadata when it is used to label a
potential media resource.

But later it says, The canPlayType(type) method must return the empty
string if type is a type that the user agent knows it cannot render or
is the type application/octet-stream;

This seems to me to be unclear as to when sniffing of the audio/video
resource occurs, and what it is used for.

 I imagine this ties in, too, to the issues with sniffing CSS files
 that has been raised elsewhere:

 https://bugzilla.mozilla.org/show_bug.cgi?id=560388
 https://bugzilla.mozilla.org/show_bug.cgi?id=562377

 Neither one of those has anything to do with application/octet-stream as far
 as I can tell.  Those cover cases in which data is sent with either no
 Content-Type header or with such a header which can't even be parsed as
 major/minor.  Neither of which is true if the data says
 appliction/octet-stream.

I was grouping them together because they both rely on context clues
for modifying the sniffing (fallback) behavior, but we can discuss
them separately if that's easier.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 3:02 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 11/29/12 2:53 AM, Gordon P. Hemsley wrote:

 At one point it says, The MIME type application/octet-stream with
 no parameters is never a type that the user agent knows it cannot
 render. User agents must treat that type as equivalent to the lack of
 any explicit Content-Type metadata when it is used to label a
 potential media resource.

 But later it says, The canPlayType(type) method must return the empty
 string if type is a type that the user agent knows it cannot render or
 is the type application/octet-stream;


 What's the contradiction?  We have set S = { types the user agent knows it
 cannot render }.  We have set T = S union { application/octet-stream }

 What the above statements tell us so far is:

 1)  T != S
 2)  canPlayType(type) must return empty string for all types in T.

 But later on in the resource selection algorithm there are certain actions
 taken for elements of S only.


 This seems to me to be unclear as to when sniffing of the audio/video
 resource occurs, and what it is used for.


 It's used for actually showing the video even if it's sent as
 application/octet-stream.

The apparent contradiction occurs when, e.g., an Opus file is tagged
as application/octet-stream.

If I understand correctly, a UA would return  when canPlayType() is
called against such a file—but then the file would actually play
because it is later sniffed as application/ogg.

Am I missing something?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 12:57 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 canPlayType is not called against a file.  It's called with a single
 argument which is a string MIME type.  If you pass
 application/octet-stream, it will return .  Its behavior does not depend
 on any state of the element it's called on (like what it's actually pointing
 to, etc); only on the string passed in.

Oh, I see. My mistake. (One should never attempt to understand
something after 2 AM.)

So... are there any additional places where application/octet-stream
should be treated as if the media type was undefined? Or is this
conversation moot now?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Sniffing archives

2012-11-29 Thread Gordon P. Hemsley
To be clear, I'm asking this because I would like to remove the
sniffing of archive types from the mimesniff spec if there aren't any
valid usecases.

On Wed, Nov 28, 2012 at 12:18 PM, Gordon P. Hemsley gphems...@gmail.com wrote:
 The mimesniff spec currently includes signatures for ZIP, gzip, and
 RAR archive formats. However, no major browser seems to support them
 natively (they all prompt for download), and it's not clear whether
 the type detection is a product of the browser code or the OS, or
 whether it is used beyond choosing an appropriate file extension for
 the download.

 Are there any valid usecases for explicitly sniffing archive formats
 instead of letting them default to application/octet-stream like other
 binary files would? Note that Henri Sivonen has previously raised the
 issue that ZIP-based formats (like office suite documents), for
 example, would be misleadingly sniffed as ZIP files, and there is no
 easy way around that.

 --
 Gordon P. Hemsley
 m...@gphemsley.org
 http://gphemsley.org/ • http://gphemsley.org/blog/



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-29 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 2:30 PM, Adam Barth w...@adambarth.com wrote:
 On Wed, Nov 28, 2012 at 10:30 PM, Gordon P. Hemsley gphems...@gmail.com 
 wrote:
 Based on my reading of the source code, it seems that Gecko treats a
 resource served as 'application/octet-stream' as an unknown type which
 is sniffed as if no Content-Type was specified.

 Are there security implications with doing this?

 Yes, there are very large security consequences.  I'm sorry that I
 don't have time to respond to all of these threads in detail, but I'm
 worried that you don't understand the consequences of the changes
 you're proposing to this specification.

 I'm not sure how to help you succeed here, but tweaking things in the
 spec without a compelling reason for doing so is not likely to lead to
 a useful specification.  I spent a great deal of time and effort
 studying the behaviors of many user agents and of a massive amount of
 content on the web.  I'm certainly willing to believe that the spec
 can be improved, but if you don't understand these sorts of basic
 things about content sniffing, I worry that changes that you make to
 the spec won't be improvements.

 Adam

I and others have already made clear that I was misreading the Mozilla
source code.

I'm aware of the security implications of interpreting a resource as
something other than what the Content-Type header says. The whole
reason I sent the original e-mail was because I thought Mozilla was
sniffing application/octet-stream in a way that it shouldn't, and I
wanted to clarify whether there was something I was missing.

I think you need to tone down your worry about my changes to the spec.
If I didn't have concern for the security implications for a change, I
wouldn't be sending an e-mail to the list about them, would I?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-28 Thread Gordon P. Hemsley
Based on my reading of the source code, it seems that Gecko treats a
resource served as 'application/octet-stream' as an unknown type which
is sniffed as if no Content-Type was specified.

Are there security implications with doing this? Or should I add
'application/octet-stream' to the list of unknown types that currently
includes 'unknown/unknown', 'application/unknown', and '*/*' (step 2
of the media type sniffing algorithm)? Or, given that that step
calls the rules for identifying an unknown media type with the
sniff-scriptable flag set, should it get its own call, with the
sniff-scriptable flag unset? Are there other options here?

I haven't checked what UAs actually do in practice, but I don't
believe the spec currently allows anything but leaving resources
tagged as 'application/octet-stream' as they are.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Treating application/octet-stream as unknown for sniffing

2012-11-28 Thread Gordon P. Hemsley
On Thu, Nov 29, 2012 at 1:30 AM, Gordon P. Hemsley gphems...@gmail.com wrote:
 Based on my reading of the source code, it seems that Gecko treats a
 resource served as 'application/octet-stream' as an unknown type which
 is sniffed as if no Content-Type was specified.

Oh, wait, I forgot what I was reading—Gecko does this specifically in
the context of sniffing for an audio or video resource. So, if a
resource tagged as 'application/octet-stream' is included in audio
or video, for example, it will be treated as unknown for the
purposes of identifying its true nature. This never follows a path of
scriptable privilege escalation, AFAICT.

So perhaps a more useful question would be what to do in situations
like that—should mimesniff treat application/octet-stream as a type
supported by the browser for the purposes of sniffing images, audio
or video, fonts, or other media types?

I imagine this ties in, too, to the issues with sniffing CSS files
that has been raised elsewhere:

https://bugzilla.mozilla.org/show_bug.cgi?id=560388
https://bugzilla.mozilla.org/show_bug.cgi?id=562377
https://bugzilla.mozilla.org/show_bug.cgi?id=808593

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Handling container formats like Ogg

2012-11-27 Thread Gordon P. Hemsley
On Tue, Nov 27, 2012 at 4:39 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On Tue, Nov 27, 2012 at 12:59 AM, Gordon P. Hemsley gphems...@gmail.com 
 wrote:
 Would this be something UAs would prefer to handle in their Ogg
 library, or should I spec it as part of sniffing?

 What would be the use case for handling it as part of sniffing layer?

I don't know; that's why I'm asking! :)

Is it sufficient to sniff just for application/ogg and then let the
UA's Ogg library determine whether or not the contents of the file can
be handled? (I'm sensing the consensus is yes.)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Audio and video sniffing

2012-11-27 Thread Gordon P. Hemsley
Done: 
https://github.com/whatwg/mimesniff/commit/77ee676c8852f4e76facd7d6c1174ac0ec41696e

Note that this also affects the media type sniffing algorithm and
the rules for identifying an unknown media type.

On Tue, Nov 27, 2012 at 12:51 AM, Simon Pieters sim...@opera.com wrote:
 On Mon, 26 Nov 2012 23:38:02 +0100, Gordon P. Hemsley gphems...@gmail.com
 wrote:

 Upon looking through the code for Gecko's media sniffing, I noticed
 that they seem to combine sniffing for audio and video elements. Given
 that Opera has said that it uses the specific sniffing algorithms, and
 that some media containers (like Ogg) can be used for either audio or
 video, I wonder if it would make sense to combine audio and video
 sniffing under a single audiovisual category? This would affect the
 matching audio/video type pattern sections and the sniffing
 audio/video specifically sections.

 Any objections? Other thoughts?


 Yes, I think it makes sense to have the same sniffing for both. audio is
 like video without the rendering area.

 --
 Simon Pieters
 Opera Software



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Audio and video sniffing

2012-11-26 Thread Gordon P. Hemsley
Upon looking through the code for Gecko's media sniffing, I noticed
that they seem to combine sniffing for audio and video elements. Given
that Opera has said that it uses the specific sniffing algorithms, and
that some media containers (like Ogg) can be used for either audio or
video, I wonder if it would make sense to combine audio and video
sniffing under a single audiovisual category? This would affect the
matching audio/video type pattern sections and the sniffing
audio/video specifically sections.

Any objections? Other thoughts?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Handling container formats like Ogg

2012-11-26 Thread Gordon P. Hemsley
Container formats like Ogg can be used to store many different audio
and video formats, all of which can be identified generically as
application/ogg. Determining which individual format to use (which
can be identified interchangeably as the slightly-less-generic
audio/ogg or video/ogg, or using a 'codecs' parameter, or using a
dedicated media type) is much more complex, because they all use the
same OggS signature. It would requiring actually attempting to parse
the Ogg container to determine which audio or video format it is using
(perhaps not unsimilar to what is done for MP4 video and what might
have to be done with MP3 files without ID3 tags).

Would this be something UAs would prefer to handle in their Ogg
library, or should I spec it as part of sniffing?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] The X-Content-Type-Options header

2012-11-16 Thread Gordon P. Hemsley
https://www.w3.org/Bugs/Public/show_bug.cgi?id=19865

Microsoft introduced the X-Content-Type-Options header in IE8 back in 2008:

http://blogs.msdn.com/b/ie/archive/2008/09/02/ie8-security-part-vi-beta-2-update.aspx

I would like to integrate the header into mimesniff and describe its
proper usage.

Right now, it allows one parameter: 'nosniff'. I would like to allow
the presence of this parameter to set the 'no-sniff flag' that I just
introduced into mimesniff (in addition to that flag's existing
duties):

http://mimesniff.spec.whatwg.org/#no-sniff-flag

But I would also like to fully spec the header, while leaving open the
possibility that other values may be added in the future.

In addition, I would like to, if I could, also allow the header to be
specified without the 'X-' prefix (so as 'Content-Type-Options'), for
that reason (and because of best current practice).

Does anyone have any questions, comments, or objections about this issue?

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] Proposal for a debugging information API

2012-11-14 Thread Gordon P. Hemsley
Recent blog posts that coincidentally may be useful in this discussion:

http://vocamus.net/dave/?p=1532
http://www.twobraids.com/2012/11/socorro-as-service.html

On Thu, Nov 15, 2012 at 12:07 AM, David Barrett-Kahn d...@google.com wrote:

 Hi whatwg.  I have a proposal for a new web standard, and would value your
 feedback.  This is based on my experiences working on Google Docs, which
 has a well developed ability to send crash reports back to the server for
 analysis.  We often find these crash reports to be lacking in crucial
 information though, because that information is not available on the JS
 APIs.

 My proposal is to have a class of information which can be made available
 to an app only after the display of a generic 'this application has
 crashed' dialog, which could be drilled into to show what is being
 disclosed, and which of course can be denied.

 Good examples of the information in question are the system's precise
 hardware and network configuration, what Chrome extensions it has
 installed, and perhaps a screenshot of the failed application.

 I've fleshed this out in the following document, and would value opinions
 on the value of a feature of this kind, and the merits of this particular
 approach.


 https://docs.google.com/document/pub?id=1pw2Bzvy6OEn8YY3fAcZiReJPmgB79swkx-NJAdcemPk

 Thanks!

 -Dave




-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] [mimesniff] Review requested on MIME Sniffing Standard

2012-11-12 Thread Gordon P. Hemsley
On Mon, Nov 12, 2012 at 10:06 AM, Henri Sivonen hsivo...@iki.fi wrote:
 Resending feedback previously written at
 https://bugzilla.mozilla.org/show_bug.cgi?id=808593#c10 :

 I think the bits ‘type is equal to font or’ and ‘type is equal to
 archive or’ are highly questionable. The most popular font types are
 in the process of getting application/ types and the most popular
 archives already have application/ types.

Buzzkill. ;(

 I suspect the ‘a reasonable amount of time has elapsed, as determined
 by the user agent.’ is unnecessary. The HTML spec has the same
 provision for the meta prescan. Firefox didn’t implement it, a
 couple of people complained, then fixed their code, and the sky didn’t
 fall.

This line was present in a previous draft of the spec, as was the
seeming allowance to begin matching the resource header before it had
finished loading. For simplicity in the algorithm, I removed the
latter, so I left the former in as an escape hatch for those who
wanted to emulate that behavior.

But if everyone vows to just wait for 512 bytes (or EOF), then that's
fine with me.

 What are the use cases for ‘Sniffing archives specifically’?

No idea. I only included it for completeness.

The 'rules for sniffing * specifically' are intended as hooks for
other specs to tie into. If no spec requires you to implement it, then
you have no need to implement it. HTML uses 'rules for sniffing images
specifically' (and 'rules for distinguishing if a resource is text or
binary'), and I imagine it could also find uses for 'rules for
sniffing audio specifically' and 'rules for sniffing video
specifically' (and maybe even 'rules for sniffing fonts
specifically').

 It
 appears that it sniffs ODF-style files
 (http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part3.html#__RefHeading__752809_826425813
 ; EPUB, ODF, InDesign, etc.) and Open Packaging Conventions-based
 files (https://en.wikipedia.org/wiki/Open_Packaging_Conventions ;
 OOXML, XPS, etc.) files as zip archives. Is that intended and a
 desirable outcome in the light of use cases? (In general, it would be
 easier to review if the spec makes sense if the use cases and callers
 of various sniffing functions were known.)

I don't think that's intended, but I don't know. The selection of
which bytes to sniff predates me, and I don't know what the use cases
are.

 Otherwise, looks good to me.

Thanks for the review!

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [mimesniff] Review requested on MIME Sniffing Standard

2012-11-05 Thread Gordon P. Hemsley
Hey all,

As you might have heard, I have taken over editorship of the MIME Sniffing
Standard from Adam Barth.

As a first step in my editorship, I have taken the opportunity to rewrite
the document in a more procedural and modular way (IMO). The content and
meaning itself is not supposed to have changed, and I need your help to
verify that that is the case:

http://mimesniff.spec.whatwg.org/

In addition, this now means that I am open to hearing your suggestions
about how to improve the document beyond its current (i.e. former)
semantics.

You can file bugs here:

https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WHATWGcomponent=MIME

As this document was originally an IETF document, there are also old issues
here:

http://trac.tools.ietf.org/wg/websec/trac/query?component=mime-sniff

It's not clear to me which of those remain outstanding on the current
version of the document, and it would be helpful to me if individuals with
a vested interest in them could migrate them to Bugzilla (with updated
descriptions that reflect the current state of the document). This will
ensure that I address them in a timely manner.

Also, it would be helpful if you could mark them as blocking the general
bug here:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=19746

And if you want to follow the commits as they happen, you can follow
@mimesniff on Twitter:

https://twitter.com/mimesniff

Thanks!

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


[whatwg] [wiki] The WHATWG Wiki has been upgraded

2012-10-28 Thread Gordon P. Hemsley
For those who missed the announcement on IRC and Twitter last week:

The WHATWG Wiki has been upgraded to MediaWiki 1.19.2:

http://wiki.whatwg.org/

This update brings with it a lot of the changes you're probably
already used to from Wikipedia, including the new Vector theme.

Over the many years since the WHATWG Wiki was first set up, a lot of
cruft has accumulated in its configuration files. I have attempted to
remove a lot of that, in order to allow the modern default values to
come through. I don't if this will have much effect on the everyday
use of the wiki, but I thought I'd let you know.

In addition to the primary software update, I have also installed a
number of extensions, and these will have an effect on your use of the
wiki.

There are three extensions that I want to bring your attention to specifically.

The first one is ParserFunctions, which allows you to use some logical
functions in pages to (for example) create conditional output. This is
most useful, IMO, in templates, so you can condition the display of
the template based on the presence, absence, or value of template
parameters. See [[Template:Obsolete]] for an example:

http://wiki.whatwg.org/wiki/Template:Obsolete

The second extension I want to bring your attention to is
SyntaxHighlight. This allows you to use the syntaxhighlight element
in a wiki page to automatically highlight whatever source code you
include. Given who we are, I've set it up to assume the language you
are highlighting is 'html5', but you can also specify another language
using the 'lang' attribute. (Note: This is not the same 'lang'
attribute that you would normally find in HTML. It's looking for a
programming language, not a BCP47 language tag.)

And the third, and potentially the most useful, extension is Gadgets.
This extension allows any administrator to install JavaScript and CSS
gadgets directly onto the wiki, for use by all. I've installed a
subset of the gadgets installed on Wikipedia which I think are the
most useful. I've also turned many of them on by default; you can see
full list of available gadgets (and edit your personal gadget
availability) by going to My preferences  Gadgets.

To see the full list of installed extensions, go to [[Special:Version]]:

http://wiki.whatwg.org/wiki/Special:Version

If you know of any useful extensions or gadgets that you think are
missing from the WHATWG Wiki, let me know and I'll be happy to install
them. And, as I am now the caretaker of the wiki (taking over, I
believe, from AryehGregor), let me know about any other wiki issues
you might have.

By the way, I think the wiki is a particularly useful place to store
information that might otherwise get lost in the shuffle of IRC logs
and e-mail archives, so if you have any such tidbits, head over to the
wiki and write them down!

If you don't yet have a wiki account, you'll have to ask someone for
help, as we've had some issues with spam accounts. But don't worry,
it's very simple to get help, as I've set it up so that any
autoconfirmed user can register an account. All they need is your
e-mail address and your desired username. If you just let out a cry on
IRC, someone should be able to help you, or you can contact one of the
permanent autoconfirmed users listed here:

http://wiki.whatwg.org/index.php?title=Special:ListUsersgroup=autoconfirmed

Happy wikiing!

Gordon

P.S. If you think you should be a permanent autoconfirmed member (and
you're not), ping me on IRC or drop me a line off-list and I'll see
what I can do. ;)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/


Re: [whatwg] base64 entities

2010-08-27 Thread Gordon P. Hemsley
On Fri, Aug 27, 2010 at 2:44 PM, Aryeh Gregor simetrical+...@gmail.com wrote:
  PHP offers no JS-string-literal-escape function. `addslashes` is very close,
  but won't handle some cases with non-ASCII characters correctly. Better to
  use `json_encode` to transfer the string, then write as text:
 
     elmt.textContent = ?php echo json_encode('Hi there, '+$name,
  JSON_HEX_TAG); ?
 
  (assuming innerText or Text Node backup for IE/older browsers.)

 Interesting, that's useful.  Too bad it only works in PHP 5.2 or higher.

PHP 5.2.0 came out in 2006. I don't see anything too bad about using
PHP 5.2 or higher with new technology.[1]

Regards,
Gordon

[1] See also: http://gophp5.org/

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


[whatwg] Proposal: @srctype or @type on iframe

2010-07-13 Thread Gordon P. Hemsley
Hello all.

There a number of attributes that are designed to give the user agent a
preview of what MIME type to except for referenced resource. (And there are
also attributes like @hreflang that preview other things.) And yet,
iframe, which has to load a full document, has no ability to allow the
user agent to determine compatibility.

Thus, I propose doing one of the following:
(1) add @srctype to iframe
(2) extend the meaning of @type that applies to a, area, and link to
apply to iframe, as well

I'm more inclined to believe that option (2) is the better option.

But now for the reasoning.

It should not be assumed that whatever resource included via iframe is
going to be of type 'text/html' or another easily parsable type. Thus, it
could be helpful for the author to give the user agent a hint as to what
type of document it is requesting be displayed inline, and allow the user
agent to choose not to display the contents of the iframe if it feels it
cannot support it.

The particular use case that prompted me to think about this is including a
PDF via iframe. In Firefox (last I checked), one is required to install a
separate add-on in order to support in-browser display of PDF files on Mac
OS X, since there is no native or integrated Adobe Reader support available.
Without the add-on, the user will be prompted to download the PDF file,
which can be very disconcerting if the user wasn't even expecting a PDF
file. And I'm sure there are plenty of other instances where this same
situation occurs. (TIFF files, perhaps? Like on the U.S. Patent Office's
website?)

Now, I'm not a spec implementor by any means, but I am a web author and a
web user, so I've been on both sides of this issue. And it doesn't appear
that it would be too complicated to extend the existing support of @type.

Thoughts?

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] Proposal: @srctype or @type on iframe

2010-07-13 Thread Gordon P. Hemsley
Nils,

I don't hate the HTTP Content-Type header. In fact, I like it very much.

But this proposal was intended to guide the user agent before they
ever receive the HTTP Content-Type header. ;)

Cheers,
Gordon

On Tue, Jul 13, 2010 at 2:48 AM, Nils Dagsson Moskopp
nils-dagsson-mosk...@dieweltistgarnichtso.net wrote:
 Gordon P. Hemsley gphems...@gmail.com schrieb am Tue, 13 Jul 2010
 02:31:19 -0400:

 It should not be assumed that whatever resource included via iframe
 is going to be of type 'text/html' or another easily parsable type.
 Thus, it could be helpful for the author to give the user agent a
 hint as to what type of document it is requesting be displayed
 inline, and allow the user agent to choose not to display the
 contents of the iframe if it feels it cannot support it.

 Have you thought of using HTTP Content-Type headers and classic MIME
 type handling to determine compatibility ?

 […]

 Now, I'm not a spec implementor by any means, but I am a web author
 and a web user, so I've been on both sides of this issue. And it
 doesn't appear that it would be too complicated to extend the
 existing support of @type.

 AFAIK, implementors could use HTTP Content-Type headers for the given
 purpose.

 Thoughts?

 Why do you hate HTTP Content-Type headers ? ;)


 Cheers,
 Nils




-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] Proposal: @srctype or @type on iframe

2010-07-13 Thread Gordon P. Hemsley
On Tue, Jul 13, 2010 at 3:26 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 7/12/10 11:31 PM, Gordon P. Hemsley wrote:

 The particular use case that prompted me to think about this is
 including a PDF via iframe. In Firefox (last I checked), one is
 required to install a separate add-on in order to support in-browser
 display of PDF files on Mac OS X, since there is no native or integrated
 Adobe Reader support available.

 I'm pretty sure you can install the Adobe Reader plug-in on Mac if you want
 to.

Perhaps now, but that wasn't always the case—at least not for Firefox.
I admit that my experience is somewhat outdated. Installing the
third-party PDF viewer add-on is one of the first things I did, in a
set it and forget it kind of way. (Plus, I'm still on Tiger.)

But, again, the PDF example was just one possible use case. I'm sure
there are plenty of other file types that cause similar situations,
including the TIFF issue that I mentioned.

 Without the add-on, the user will be prompted to download the PDF file

 Which is exactly what would happen for a type=application/pdf iframe, no?
  Silently not showing the content doesn't seem acceptable.

 -Boris

Well, the idea is to have the browser operate more intelligently than
that. The page in the iframe is (by definition) not the primary
document that the user is trying to load, so it shouldn't have the
power steal the user's attention immediately upon page load. It would
be very disorienting, and would likely cause the user to lose their
train of thought.

I was thinking more along the lines of Flashblock does or what happens
when the window in an iframe can't load: The content would be
replaced somehow by a message and a button/link to allow the user to
manually download the contents of the iframe, if they so choose. It
shouldn't make that decision for the user, as it's not the user's
fault that their browser does not support the format of some ancillary
document.

At least, that's how I see it.

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] select element should have a required attribute

2010-06-18 Thread Gordon P. Hemsley
I'm not sure how you interpreted, but I wanted to clarify, in case it wasn't
clear.

I'm pretty sure this person is asking why @required isn't allowed on
select elements.

As in:
http://dev.w3.org/html5/markup/forms-attributes.html#shared-form.attrs.required

I don't know what the exact reasoning is for it not being on there, nor do I
know exactly how @required is supposed to be enforced, but I do think that
the method suggested in the bug is a bad one. Sometimes, authors will
include an empty option on purpose in order to allow for an empty option
to be selected.

Thus, as you've said, Ash, there will always be some sort of value sent from
a select element. And, including the option of an empty string, I can't
think of any way that there wouldn't be a value sent.

Gordon

On Fri, Jun 18, 2010 at 7:04 AM, Ashley Sheridan
a...@ashleysheridan.co.ukwrote:

  On Fri, 2010-06-18 at 11:35 +0200, Mounir Lamouri wrote:

 Hi,

 I'm wondering why select element do not have a required attribute. It
 seems to be perfectly suitable. With the required attribute, select
 element would be able to suffer from being missing and the :required
 pseudo-class could apply.

 Is there a reason why the select element has no required attribute or
 it's only an omission?

 Related bug:http://www.w3.org/Bugs/Public/show_bug.cgi?id=9625

 Thanks,
 --
 Mounir


 Required as in it should always have a value sent? If so, then it always
 does. The default value for a select element is not an empty string as an
 option is always there (unless someone has been stupid enough to create an
 empty select list.)

 As such, some sort of value will always be sent.

   Thanks,
 Ash
 http://www.ashleysheridan.co.uk





-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] Is there a way to stop scrolling when pressing directional arrows?

2010-06-14 Thread Gordon P. Hemsley
For what it's worth, I am actually of the opposite opinion, Ash.

I like it when Flash steals the focus of the keyboard, and here's why:
Besides the arrow keys, which are available to everyone, I also use the
Find As You Type feature in Firefox. However, that usually means that I
can't play any HTML5 games that use letters as play keys. Because the HTML5
game usually doesn't steal the focus of the keyboard, typing a letter key
activates the FAYT feature and distracts me from the game.

With that being said, Bespin (from Mozilla Labs) uses canvas, and it has
no problem stealing the keyboard focus (with JavaScript) for most
keypresses.

Gordon

2010/6/14 Ashley Sheridan a...@ashleysheridan.co.uk

  On Mon, 2010-06-14 at 13:38 -0600, Carlos Andrés Solís wrote:

 Hello! I've been noticing a problem in many HTML5 test apps, very
 especially games. When the directional arrow buttons are pressed, the screen
 scrolls. This is a problem that, as far as I know, Flash had solved by
 changing the focus of the application to the app. Is this doable in HTML5?
 - Carlos Solís


 I don't think it's something that was 'solved'  by Flash. To be honest, I'm
 often annoyed at the way Flash steals the focus of all my key presses making
 it almost impossible to navigate using only the keyboard.

 You could use Javascript to put the focus onto an object, capture all the
 key presses on that and return false for them all maybe.

   Thanks,
 Ash
 http://www.ashleysheridan.co.uk





-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] % text % and ? text ? in corporate intranet html content

2010-02-15 Thread Gordon P. Hemsley
On Tue, Feb 9, 2010 at 10:05 PM, Biju bijumaill...@gmail.com wrote:

 What should a user agent display when html content is...

 htmlbody
 %@ page language=java %
 /body/html

 At present IE and Safari display blank

 Firefox display %@ page language=java %

 And for document.body.innerHTML browsers give
 Firefox -- lt;%@ page language=java %gt;
 IE --%@ page language=java %
 and Safari gives blank

 Also for
 htmlbody
 ? some text ?
 /body/html

 Firefox gives blank

 But for
 htmlbody
 abc ? echo   ? xyz
 /body/html

 Firefox display...
 abc  ? xyz

 ie, all the contents after first 
 with .innerHTML -- abc   ?gt; xyz

 IE in this case again hide all content till ?
 as well as preserve content including the white space in innerHTML

 Due to these problems browsing corporate intranet with Firefox is
 little irritating.
 Calling help desk and asking to provide fix will get a reply that
 company has standardized on IE6, so please use IE.


 So per HTML standard in both case what should user agent display and
 as well as content of .innerHTML

 Thanks
 Biju


For what it's worth, I filed a Mozilla bug on a similar issue, and it was
marked INVALID.

https://bugzilla.mozilla.org/show_bug.cgi?id=477455
Parser does not wait for ? to close blocks that begin with ?

(Incidentally, Hixie didn't care all that much. :) )

Gordon

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] the cite element

2009-10-06 Thread Gordon P. Hemsley
(I'm ignoring all of the unproductive back-and-forth that has occurred
thus far. This is meant to start the discussion off fresh.)

I was discussing the cite element with TabAtkins on IRC and I
proposed analyzing the actual word 'cite'. Using it as a verb, the
definition of 'cite' applies to quotes/quotations, titles, and people,
depending on the context. TabAtkins noted that the first use case is
so far off of legacy implementations, that it wouldn't even be worth
considering for cite (especially because we have other elements that
function as such).

That leaves usages of 'cite' for both titles of works and authors of
works. Putting aside the issue of styling for a moment, these two
pieces of data both fall under the semantic meaning of 'cite'. Thus,
they should fall under the semantic meaning of cite. If an author
should have the need to differentiate between the two, I propose that
they use cite class=title and cite class=author.

Thus, I propose the following (which TabAtkins generally agrees with):

Leave the default styling of cite to be italicized for legacy
implementations and allow any reference to any work or author, with
the granularity decided by the individual web developer.

I also propose allowing parenthetical citations and footnote markers
(as is used in the various W3C/WHATWG specifications) to also be
marked up with cite, though I'm not sure if TabAtkins agrees with me
on that point.

I hope this message can help bring the discussion back to a neutral
zone that will lead to an amicable resolution of this long debate.

Regards,
Gordon

--
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] the cite element

2009-10-06 Thread Gordon P. Hemsley
On Tue, Oct 6, 2009 at 4:15 PM, Erik Vorhes e...@textivism.com wrote:
 On Tue, Oct 6, 2009 at 2:52 PM, Gordon P. Hemsley gphems...@gmail.com wrote:
 I also propose allowing parenthetical citations and footnote markers
 (as is used in the various W3C/WHATWG specifications) to also be
 marked up with cite, though I'm not sure if TabAtkins agrees with me
 on that point.

 I suppose a allows for more functionality in current UAs, but this
 is an interesting proposition, especially if there were a way to
 crosslink cite used in this way to the original source (or whatever
 it would point to). Would it be something along the lines of cite
 for=aside-id, or did you have something else in mind?

 Erik

Hmm... I hadn't given much thought to the implementation of that, as I
was more worried about the other part of the debate, but I think
treating cite as analogous to label in that situation is indeed a
good idea.

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] [html5] r4029 - [e] (0) Example of section use without article.

2009-09-29 Thread Gordon P. Hemsley
s/Html/html/

On Tue, Sep 29, 2009 at 4:30 AM, Simon Pieters sim...@opera.com wrote:

 On Tue, 29 Sep 2009 07:57:21 +0200, wha...@whatwg.org wrote:

  Author: ianh
 Date: 2009-09-28 22:57:20 -0700 (Mon, 28 Sep 2009)
 New Revision: 4029

 Modified:
   index
   source
 Log:
 [e] (0) Example of section use without article.

 Modified: index
 ===
 --- index   2009-09-29 02:41:23 UTC (rev 4028)
 +++ index   2009-09-29 05:57:20 UTC (rev 4029)
 @@ -13031,7 +13031,60 @@
  /div
 +  div class=example
 +   pHere is a graduation programme with two sections, one for the
 +   list of people graduating, and one for the description of the
 +   ceremony./p
 +
 +   prelt;!DOCTPE Htmlgt;


 s/DOCTPE/DOCTYPE/

 --
 Simon Pieters
 Opera Software




-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] [html5] r4029 - [e] (0) Example of section use without article.

2009-09-29 Thread Gordon P. Hemsley
Ah. I was afraid you might say that.

On Tue, Sep 29, 2009 at 6:54 PM, Ian Hickson i...@hixie.ch wrote:

 On Tue, 29 Sep 2009, Gordon P. Hemsley wrote:
 
  s/Html/html/

 Actually that was intentional in that example. I like to show a variety of
 syntaxes so that people can see that they can do whichever one they
 prefer.

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'




-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/


Re: [whatwg] article/section/details naming/definition problems

2009-09-16 Thread Gordon P. Hemsley
I'd sent this earlier, but it got caught in the message queue that
apparently nobody checks. Let's see if it works this time.

-- Forwarded message --
From: Gordon P. Hemsley gphems...@gmail.com
Date: Tue, Sep 15, 2009 at 11:31 PM
Subject: Re: [whatwg] article/section/details naming/definition problems
To: whatwg List wha...@whatwg.org


On Tue, Sep 15, 2009 at 9:08 PM, Ian Hickson i...@hixie.ch wrote:

 On Tue, 15 Sep 2009, Jeremy Keith wrote:
  In that blog post, I point out that section and article were once
 more
  divergent but have converged over time (since the @cite and @pubdate
  attributes were dropped from article).
 
  I've also seen a lot of confusion from authors wondering when to use
 section
  and when to use article. Bruce wrote an article on HTML5 doctor
 recently to
  address this:
  http://html5doctor.com/the-section-element/
 
  Probably the best tutorial I've seen on this issue is from Ted:
  http://edward.oconnor.cx/2009/09/using-the-html5-sectioning-elements
 
  ...but even so, the confusion remains. The very fact that tutorials are
  required for what should be intuitive structural elements is worrying — I
  don't see the same issues around nav, header or footer (now that
 the
  content model has been changed) ...although there is continuing confusion
  around aside.

 I'd like to rename article, if someone can come up with a better word
 that means blog post, blog comment, forum post, or widget. I do think
 there is an important difference between a subpart of a page that is
 a potential candidate for syndication, and a subsection of a page that
 only makes sense with the rest of the page.


What about item? (Directly, it's a coincidence that RSS happens to have
the same-named element, as I just used a thesaurus. But perhaps [indirectly]
there's a reason RSS uses item to begin with. And, after all, it's
supposed to be used as a hint that it could be syndicated content, right?)

-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/



-- 
Gordon P. Hemsley
m...@gphemsley.org
http://gphemsley.org/ • http://gphemsley.org/blog/
http://sasha.sourceforge.net/ • http://www.yoursasha.com/