On 2010-09-04 01:55, James Salsman wrote:
Most of the MIME types that support multiple channels and sample rates
have registered parameters for selecting those.  Using a PCM format
such as audio/L16 (CD/Red Book audio) as a default would waste a huge
amount of network bandwidth, which translates directly into money for
some users.

On Fri, Sep 3, 2010 at 2:19 PM, David Singer<sin...@apple.com>  wrote:
I agree that if the server says it accepts something, then it should cover at 
least the obvious bases, and transcoding at the server side is not very hard.  
However, I do think tht there needs to be some way to protect the server (and 
user, in fact) from mistakes etc.  If the server was hoping for up to 10 
seconds of 8kHz mono voice to use as a security voice-print, and the UA doesn't 
cut off at 10 seconds, records at 48 Khz stereo, and the user forgets to hit 
'stop', quite a few systems might be surprised (and maybe charge for) the size 
of the resulting file.

It's also a pain at the server to have to sample-rate convert, downsample to 
mono, and so on, if the terminal could do it.

Here's an idea. Almost all codecs currently use a quality system.
Where quality is indicated by a range from 0.0 to 1.0 (a few might go -1.0 to 1.0, a tuned Ogg Vorbis has a small negative range). Anyway. If 1.0 could indicate max quality (lossless or lossy) and 0.5 would indicate 50% quality. This is similar to what most of the encoders support (usually with a -q argument).

So if the server asks for let's say FLAC at quality 0.0 that would mean compress the hell out of it vs 1.0 which would be a fast encoding. While for a lossy codec like say Ogg quality of 1.0 would mean retain as much of the original audio as possible, while 0.0 would mean toss away as much as possible.

Combine this with a min and max bitrate value etc. and a browser could be told that the server wants: "Give me audio in format zxy with medium quality (and medium CPU use as well I guess) between 100kbit and 200kbit in stereo at 48khz between 10seconds and 2minutes long."

Obviously with lossless formats the bitrate and quality means nothing, but a low quality value could indicate using the highest compression available.

I guess additionally a browser could present a UI if no max duration was indicated and ask the user to choose a sensible one. (maybe the standard could define a max length if none was negotiated as a extra safetynet?)

Oh and a lossless codec like FLAC there is usually a compression level, the higher it is the more CPU/resources are used to compress more tightly. So a quality indicator only makes sense for lossy, while both lossy and lossless should be mapable to a compression level indicator. But I think that having both quality and compression indicators might be best as many lossy codecs allows setting quality and compression level (plus bitrate range).

Hmm, has anything similar been discussed on video and image capture as well?
If not, then I think it's best to make sure that audio/image/video capture uses the exact same indicators to avoid confusion:

Bits/s: Min/max bitrate is applicable to (lossy mostly, rarely lossless) audio, video, video (w/audio), images. %: Compression level are applicable to (lossy and lossless) audio, video, video (w/audio), images. Seconds: Min/max duration are applicable to (lossy and lossless) audio, video, video (w/audio). Hz: Frequency and channels are applicable to (lossy and lossless) audio, video (w/audio). Bits: Depth of color are applicable to (lossy and lossless) video, video (w/audio), images.
Chn: Channels are applicable to (lossy and lossless) audio, video (w/audio).
WxH: Width/Height are applicable to (lossy and lossless) video, video (w/audio), images.

Bits/s = 0-??????? where 0 indicate no minimum for Min value and no maximum for Max, otherwise the value indicate the desired bitrate in Bits per second. % = 0-100 where %100 is max compression lossless or least quality if lossy, and %0 is no compression if lossless or max quality if lossy. Seconds: 0-??????? where 0 indicate no minimum duration for the Min value, and where 0 indicate no maximum for Max value, otherwise it's a number indicating the Min and Max range the server allows/expects. Hz: 0-??????? where 0 indicate that anything is acceptable, otherwise the frequency expected. Bits: 0-??????? where 0 indicates no preference, otherwise the desired bit depth for the image/video, and for audio. Chn: 0-??????? where 0 indicate no preference, otherwise the desired channels. WxH: 0-??????? where 0 indicate no preference, otherwise the desired resolution.
FPS: 0-??????? where 0 indicate no preference, otherwise desired framerate.

I believe that covers most of them?

Here's an example (of values):
Video (w/audio, and both lossy)
rate="500000-1000000"
compression="25-75"
duration="0-180"
hz="48000 44100"
chn="2 1-2"
bits="16 24 32"
wxh="1920x1080 1280x720 854x480  320x240-1024x768"
depth="24"
fps="24 50 60 10-60"

This means that the stream must be between 500kbit and 1mbit, video+audio combined,
compression must be between 25%-75% (thus averaging 50% quality maybe?),
no minimum length, but must not be longer than 3 minutes,
the frequency must be either 44.1KHz or 48KHz,
only mono or stereo is allowed, stereo is desired if possible,
16 or 24 or 32 bit audio (lossy codecs like mp3 is floating point so in that case the bits really do not matter that much) any resolution from 320x240 and up to 1024x768 is accepted, but if possible 480 or 720 or 1080 is desired (widescreen implied by the explicit ratios).
24bit color desired if possible.
any framerate from 10 to 60fps accepted, but if possible 24 or 50 or 60 fps is desired.

This should give the browser enough info to pass on to the video and audio encoders, or enough info to calculate the details the encoders need.

A few more examples:
Audio (lossless)
compression="100"
duration="0-180"
hz="22050 8192 11025 44100 48000"
chn="1 1-2"
bits="16"

Image (lossy)
compression="85"
wxh="16x16-8192x8192"
depth="24 0-32"

Audio (lossy)
rate="128000-192000"
compression="0"
duration="0-180"
hz="48000 44100"
chn="1 2"

Hopefully it's all self explanatory,
but let me point out in that the last audio example the compression indicate max quality but the stream will be constrained within 128-192kbit per second. Also if you look at the two audio examples "1 2" and "1 1-2" essentially mean the same, 1 to 2 channels accepted but 1 channel is preferred. I think that something similar to the existing http accept header could be used as the basis for this. The worst I guess is coming up with short but sensible value names and what type of values would be acceptable, order of preference, default behavior if missing etc.

One thing is for sure, video, video (w/audio), audio, images, all need a unified way to keep people from going insane, heck I just ate up like 1 and a half hours just thinking up the stuff above and writing this post, so you can see my thoughts kind of changing on some things. (the beginning of the this post is actually 1 and a half hours ago, so this is me descending into madness in the middle of the night) *rubs temples*

Heh! Hopefully this all made sense to you all though... And that you all understand that if this is to be done, it really needs to be done properly "now" rather than a really nasty patchwork later on.

Who knows, maybe later some day the http accept header ends up adopting some of the ideas above for improved content negotiation with the server when fetching media (an iPod and a Gaming PC would have different abilities and support for media due to screen sizes, bandwidth and CPU power as well).


--

Roger "Rescator" Hågensen.
Freelancer - http://EmSai.net/

Reply via email to