If memory serves correctly, the conference mixer doesn't have to mix all
incoming audio, but rather only has to mix relevant audio (i.e. figure
out who's talking, and take that single audio stream and send it out to
all the participating channels). One challenge I would expect would be
figuring out the noise threshold (i.e. what is talking and what is just
background noise), and knowing to quickly enable a channel when somebody
is speaking. A good mixer should be able to handle more than one person
speaking, but since for the most part people can only handle one person
talking at a time, if the mixer is good, it doesn't have to work so hard
at that.
I suspect the math involved is pretty complex, though.
This also gets me wondering if multiple, discreet conferences eat up
more horsepower than a single conference would, even with a large number
of participants.
I suspect there's a lot more to it than that, though.
Jim
D. Hugh Redelmeier wrote:
| From: Rachel Quin <[email protected]>
|
| I think I'm not making myself clear, sorry. Our t3's and Megalink circuit
| from Bell come into AS5400's. Our VoIP infrastructure is entirely SIP. A
| conferencing server would only handle RTP streams, mixing channels for many
| large-ish volume conferences. The box I'm talking about would have 2 10gig
| nics, one or two DSP cards, and whatever software is needed to handle
| managing conferencing and directing RTP/G.711 content channels to and from
| the DSP card(s). I am not looking to build a stand alone phone system.
Naively, I would think mixing RTP streams of G.711 should not be too
hard for a regular CPU.
G.711 is PCM so decoding and encoding is a snap. Mixing is just a kind
of averaging, I imagine.
But: I did say "naively". I've never done any of this. I don't know
whether automatic gain control can be done simply and cheaply. I
don't know how you can sum a hundred channels and not get overloaded
with noise. I'll waive my hands and say that different channels don't
need to be transformed to use the same timebase, but maybe I'm wrong.
I know nothing about echo-cancellation issues.
So, naively, the tasks of the processor would be:
- take samples from N RTP streams
- average them
- send the result out on N RTP streams.
The actual amount of computation, for the naive process, ought to be
within the realm of any modern processor for values of N up to perhaps
1000.
8K samples / channel / second == 8KB bandwidth / sec
modern processors can do (guess) 40MW main memory accesses / second (the
bottleneck, I think)
Which of the things that I've skipped are necessary and expensive?
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
--
--
Jim Van Meggelen
[email protected]
http://www.oreillynet.com/pub/au/2177
"A child is the ultimate startup, and I have three.
This makes me rich."
Guy Kawasaki
--
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]