Re: [whatwg] default audio upload format (was Fwd: The Media Capture API Working Draft)

Roger Hågensen Fri, 03 Sep 2010 21:59:35 -0700

 On 2010-09-04 01:55, James Salsman wrote:

Most of the MIME types that support multiple channels and sample rates
have registered parameters for selecting those.  Using a PCM format
such as audio/L16 (CD/Red Book audio) as a default would waste a huge
amount of network bandwidth, which translates directly into money for
some users.


On Fri, Sep 3, 2010 at 2:19 PM, David Singer<sin...@apple.com>  wrote:

I agree that if the server says it accepts something, then it should cover at 
least the obvious bases, and transcoding at the server side is not very hard.  
However, I do think tht there needs to be some way to protect the server (and 
user, in fact) from mistakes etc.  If the server was hoping for up to 10 
seconds of 8kHz mono voice to use as a security voice-print, and the UA doesn't 
cut off at 10 seconds, records at 48 Khz stereo, and the user forgets to hit 
'stop', quite a few systems might be surprised (and maybe charge for) the size 
of the resulting file.

It's also a pain at the server to have to sample-rate convert, downsample to 
mono, and so on, if the terminal could do it.


Here's an idea. Almost all codecs currently use a quality system.

Where quality is indicated by a range from 0.0 to 1.0 (a few might go-1.0 to 1.0, a tuned Ogg Vorbis has a small negative range).Anyway. If 1.0 could indicate max quality (lossless or lossy) and 0.5would indicate 50% quality.This is similar to what most of the encoders support (usually with a -qargument).

So if the server asks for let's say FLAC at quality 0.0 that would meancompress the hell out of it vs 1.0 which would be a fast encoding.While for a lossy codec like say Ogg quality of 1.0 would mean retain asmuch of the original audio as possible, while 0.0 would mean toss awayas much as possible.

Combine this with a min and max bitrate value etc. and a browser couldbe told that the server wants:"Give me audio in format zxy with medium quality (and medium CPU use aswell I guess) between 100kbit and 200kbit in stereo at 48khz between10seconds and 2minutes long."

Obviously with lossless formats the bitrate and quality means nothing,but a low quality value could indicate using the highest compressionavailable.

I guess additionally a browser could present a UI if no max duration wasindicated and ask the user to choose a sensible one. (maybe the standardcould define a max length if none was negotiated as a extra safetynet?)

Oh and a lossless codec like FLAC there is usually a compression level,the higher it is the more CPU/resources are used to compress more tightly.So a quality indicator only makes sense for lossy, while both lossy andlossless should be mapable to a compression level indicator.But I think that having both quality and compression indicators might bebest as many lossy codecs allows setting quality and compression level(plus bitrate range).


Hmm, has anything similar been discussed on video and image capture as well?

If not, then I think it's best to make sure that audio/image/videocapture uses the exact same indicators to avoid confusion:

Bits/s: Min/max bitrate is applicable to (lossy mostly, rarely lossless)audio, video, video (w/audio), images.%: Compression level are applicable to (lossy and lossless) audio,video, video (w/audio), images.Seconds: Min/max duration are applicable to (lossy and lossless) audio,video, video (w/audio).Hz: Frequency and channels are applicable to (lossy and lossless) audio,video (w/audio).Bits: Depth of color are applicable to (lossy and lossless) video, video(w/audio), images.

Chn: Channels are applicable to (lossy and lossless) audio, video (w/audio).

WxH: Width/Height are applicable to (lossy and lossless) video, video(w/audio), images.

Bits/s = 0-??????? where 0 indicate no minimum for Min value and nomaximum for Max, otherwise the value indicate the desired bitrate inBits per second.% = 0-100 where %100 is max compression lossless or least quality iflossy, and %0 is no compression if lossless or max quality if lossy.Seconds: 0-??????? where 0 indicate no minimum duration for the Minvalue, and where 0 indicate no maximum for Max value, otherwise it's anumber indicating the Min and Max range the server allows/expects.Hz: 0-??????? where 0 indicate that anything is acceptable, otherwisethe frequency expected.Bits: 0-??????? where 0 indicates no preference, otherwise the desiredbit depth for the image/video, and for audio.Chn: 0-??????? where 0 indicate no preference, otherwise the desiredchannels.WxH: 0-??????? where 0 indicate no preference, otherwise the desiredresolution.

FPS: 0-??????? where 0 indicate no preference, otherwise desired framerate.

I believe that covers most of them?

Here's an example (of values):
Video (w/audio, and both lossy)
rate="500000-1000000"
compression="25-75"
duration="0-180"
hz="48000 44100"
chn="2 1-2"
bits="16 24 32"
wxh="1920x1080 1280x720 854x480  320x240-1024x768"
depth="24"
fps="24 50 60 10-60"

This means that the stream must be between 500kbit and 1mbit,video+audio combined,

compression must be between 25%-75% (thus averaging 50% quality maybe?),
no minimum length, but must not be longer than 3 minutes,
the frequency must be either 44.1KHz or 48KHz,
only mono or stereo is allowed, stereo is desired if possible,

16 or 24 or 32 bit audio (lossy codecs like mp3 is floating point so inthat case the bits really do not matter that much)any resolution from 320x240 and up to 1024x768 is accepted, but ifpossible 480 or 720 or 1080 is desired (widescreen implied by theexplicit ratios).

24bit color desired if possible.

any framerate from 10 to 60fps accepted, but if possible 24 or 50 or 60fps is desired.

This should give the browser enough info to pass on to the video andaudio encoders, or enough info to calculate the details the encoders need.


A few more examples:
Audio (lossless)
compression="100"
duration="0-180"
hz="22050 8192 11025 44100 48000"
chn="1 1-2"
bits="16"

Image (lossy)
compression="85"
wxh="16x16-8192x8192"
depth="24 0-32"

Audio (lossy)
rate="128000-192000"
compression="0"
duration="0-180"
hz="48000 44100"
chn="1 2"

Hopefully it's all self explanatory,

but let me point out in that the last audio example the compressionindicate max quality but the stream will be constrained within128-192kbit per second.Also if you look at the two audio examples "1 2" and "1 1-2" essentiallymean the same, 1 to 2 channels accepted but 1 channel is preferred.I think that something similar to the existing http accept header couldbe used as the basis for this.The worst I guess is coming up with short but sensible value names andwhat type of values would be acceptable, order of preference, defaultbehavior if missing etc.

One thing is for sure, video, video (w/audio), audio, images, all need aunified way to keep people from going insane,heck I just ate up like 1 and a half hours just thinking up the stuffabove and writing this post, so you can see my thoughts kind of changingon some things.(the beginning of the this post is actually 1 and a half hours ago, sothis is me descending into madness in the middle of the night) *rubstemples*

Heh! Hopefully this all made sense to you all though... And that you allunderstand that if this is to be done, it really needs to be doneproperly "now" rather than a really nasty patchwork later on.

Who knows, maybe later some day the http accept header ends up adoptingsome of the ideas above for improved content negotiation with the serverwhen fetching media(an iPod and a Gaming PC would have different abilities and support formedia due to screen sizes, bandwidth and CPU power as well).



--

Roger "Rescator" Hågensen.
Freelancer - http://EmSai.net/

Re: [whatwg] default audio upload format (was Fwd: The Media Capture API Working Draft)

Reply via email to