On 2010-09-13 15:03, Mikko Rantalainen wrote:
2010-09-11 01:51 EEST: Roger Hågensen:
  On 2010-09-09 09:24, Philip Jägenstedt wrote:
For at least WAVE, Ogg and WebM it's not possible as they begin with
different magic bytes.
Then why not define a new "magic" that is universal, so that if a proper
content type is not stated then a sniffing for a standardized universal
magic is done?

Yep, I'm referring to my BINID proposal.
If a content type is missing, sniff the first 265 bytes and see if it is
a BINID, if it is a BINID check if it's a supported/expected one, and it
is then play away, all is good.
 From the "what could possibly go wrong" department of thought:

- a web server blindly prefixes files with BINID if it "knows" the file
suffix and as a result, a file ends up with a double BINID (server
assumes that no files contain BINID by default)
- a file has double BINID with contradicting content ids
- some internal API assumes that caller wants BINID in the stream, the
caller assumes that the stream has no BINID - as a result, the caller
will pass content with BINIDs embedded in the middle of stream.

Basically, this sounds like all the issues of BOM for all binary files.

And why do we need this? Because web servers are not behaving correctly
and are sending incorrect Content-Type headers? What makes you believe
that BINID will not be incorrectly used?

Because if they add a binary id then they obviously are aware of the standard. Old servers/software would just pass the file through as they are unaware so content type issues still exist there, eventually old servers/software are rotate out until most are binary id aware.
This is how rolling out new standards work.
A server would only add a binary id if none exist and it's certain (by previous sniffing) that it's guess is correct, though I guess the standard could state that if no binary id exist on a file then none should be added by the server at all (legacy behavior?) And what I meant with the server adding it I meant services like Youtube (if Youtube transcode a video to MP4 then the server knows it's delivering just that), likewise with streaming radio or video or similar, a regular webserver would have no right (or point) in modifying a file served than it does a .zip or .mp3 that a user downloads, we are talking about streaming here mainly right? (where a short max length sniffing would be a huge benefit)

(If you really believe that you can force content authors to provide
correct BINIDs, why you cannot force content authors to provide correct
Content-Types? Hopefully the goal is not to sniff if BINIDs seems okay
and ignore "clearly incorrect" ones in the future...)

I do not see why web authors (or users at all) would need to mess with the binary id at all,
it's authoring software or transcoding software that would add them.

My BINID proposal is just that, a proposal for a binary id, it does not define how servers and browsers should handle it as that is a different scope altogether. Something like a binary id would need a proper RFC writeup or similar.

I'd like to specify that the only cases an UA is allowed to sniff the
content type are:

- Content-Type header is missing (because the server clearly does not
know the type), or
- Content-Type is literal "text/plain", "text/plain;
charset=iso-8859-1", "text/plain; charset=ISO-8859-1" or "text/plain;
charset=UTF-8" (to deal with historical mess caused by IIS and Apache), or
- Content-Type is literal "application/octet-stream"

(In all these cases, the server clearly has no real knowledge. If a file
is meant for downloading, the server should use Content-Disposition:
attachment header instead of hacks such as using
"application/x-download" for Content-Type.)
Yes! But if the UA in those cases also checked for a binary ID (and found such) there would hardly be any ambiguity.
For any other value of Content-Type, honor the type specified in HTTP
level. And provide no overrides of any kind on any level above the HTTP.
Levels above HTTP may provide HINTS about the content that can be used
to aid or override *sniffing* but nothing should override any
*explicitly specified Content-Type*. [This is simplified version of the
logic that the Mozilla/Firefox already applies:
http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#684]

And for heavens sake, do not specify any sniffing as "official".
Instead, explicitly specify all sniffing as UA specific and possibly
suggest that UAs should inform the user that content is broken and the
current rendering is best effort if any sniffing is required.

Any sniffing would be as a fallback only if the UA suspects the content type is wrong (i.e. <video> of type text for example) or similar, and it would not hurt to have some standardized behavior in those cases that sniff for something simple like a short binary id rather than parse potentially several kilobytes of the stream (which was where this discussion took off originally).

--
Roger "Rescator" Hågensen.
Freelancer - http://EmSai.net/

Reply via email to