Re: [jdev] voicechat again

Ido Rosen Wed, 03 Mar 2004 02:44:44 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm new to this list, and have no knowledge of the previous argument.  Let me try to 
post some advantages and disadvantages of peer to peer versus client server approaches 
to voice communication...

Although I refer to "voice chat" below, I wish to note that I do this out of 
convenience to you.  We should really be calling it "audible communication", as some 
would stream music rather than voice, or use Jabber as a radio station, if the 
implementation is scalable.  Furthermore, I may in the future call video chat "visual 
communication".

Goals:
 (*) Scalability
 (*) Functionality
 (*) Extensibility
 (*) Compatibility

First, let us define what the options are.  Two types of channels need to reach all 
parties involved in voice chat:  Control and data.  Data in this case consists of the 
actual encoded voice stream (let's assume this is codec-independent, for now, though 
my arguments should work regardless of codec dependency decisions).  Control is, 
initially, parameters required to initiate the connection (quality [khz?], maximum 
throughput desired [kbits/sec?], codec used).  Then, during the stream, control should 
probably be used for performance monitoring (packets lost, kbits/sec, etc.)  

Let's acknowledge two things: (1) Control throughput will generally be much lower than 
data throughput.  (2) Control throughput must initially (at least at some level) be 
through server, such as when connecting the two peers by providing eachother a direct 
link (giving each the other's IP address, for example).  I know that some protocols 
will already handle this for us, but let's assume that at some point the server has to 
route this data between the clients.

So, given those assumptions, there are several possibilities:

(1) Server forwards all control + data messages between clients, as it does with IMs.
(2) Server forwards all control messages (throughput, voice establishment, codec 
selection, etc.), but data is handled in a peer to peer fashion. (Directly from one 
user to another, optionally to many others in emulation of a multicast voice 
reflector, if we're talking about voice conferencing.)
(3) All control + data messages are handled in a peer to peer fashion.  All server 
does is give each peer the other's IP address, and no more than that.

(1) presents none (or few?) of the classic peer to peer problems, such as NAT 
traversal, service discovery, etc. since there is already established communication at 
the layer that the data will be transferred on well before the voice chat session 
starts.  Both clients initiate a connection to the jabber server at the TCP layer, so 
no port forwarding issues should arise.

(1) is not scalable -- a server without multicast could easily waste its entire pipe 
with three or four high-quality voice connections.  There is significant overhead to 
using the standard jabber XML-based transport mechanisms for each data packet, if data 
packets must be sent within milliseconds.  Assuming we want lagless and lossless 
audio, there must be slack between the throughput and the maximum desired throughput, 
so minimizing overhead is a priority.  Internal networks, or networks with high-speed 
VPNs, may find this advantageous though, as the maximum desired throughput is well 
beyond any overhead, and the server is very likely to support multicast in such 
tightly knit networks.  

(2) still presents NAT traversal and port forwarding, though since a link is already 
established, alternatives can be sought, and possibly a port sweep can be attempted to 
find an open/forwarding port, or to try several permutations of client1-->client2, 
client2-->client1 connections until one works.  This is just for the data channel.  
The control channel can coordinate such sweeps/tests.  Some protocols may already do 
this.  

(2) is scalable, and an accepted method by services such as MSN, iChat, etc.  (Also 
Yahoo Super Webcam mode, I think).  (2) can fall back to (1) for internal addresses / 
by some algorithm that determines if the server's link can handle the throughput.   
(2) allows server to collect data on connection statistics, and possibly even a 
conversation log for clients with speech-to-text engines (or at least I think this'd 
be a cool feature to implement).

(3) means that jabber clients just execute an external voice chat program with the 
parameter as IP of remote client, or some other connection data, and means that the 
server does not get any knowledge about how the chat went, etc.  (3) seems to be 
overlooked or unacceptable to some users of this mailing list.

I believe that we do not necessarily have to decide between the two -- client-server 
and peer-peer -- types of data transfer modes.  I think we can define client-server as 
a fallback in case peer-peer fails, under certain conditions, such as:  client-server 
voice chats do not take up too much bandwidth -- possibly limit the audio bitrate?  
All attempts at initiating peer-peer voice chat fail.  Server can give lower priority 
(QoS?) to transmitting voice data versus IM and other data.  Server can still function 
with the additional bandwidth and resource strain of multiple (hundreds?  thousands?) 
of voice chats crossing its bus.

You probably already knew most of this, but I just thought I'd suggest that paragraph 
above, that you do not need to chose one over the other necessarily, and that we can 
use their advantages and disadvantages to produce some algorithm by which we choose 
one over the other.

Ido

On Wed, 3 Mar 2004 10:00:41 -0000
"Richard Dobson" <[EMAIL PROTECTED]> wrote:

> > then here is the simple way for this �!&%�$% p2p case:
> 
> If as I suspect the symbols above represent a swearword I suggest you calm
> down and rethink your posts or risk severly denting your credibility, such
> comments are not very professional and IMO are not appropriate here. If you
> do not want to make useful posts then I suggest you dont post, these rants
> are just a waste of bandwidth and peoples time.
> 
> > 1) use JEP 95 to negotiate the voice session parameters
> >     * includes voice codec ID
> >     * includes frame size (i.e. 2600byte)
> >     * includes crunched size if applicable
> >     * includes channels (mono/stereo)
> >     * includes sample size (8bit/16bit/24bit/32bit)
> >     * includes sample rate (i.e. 8000hz/16000hz)
> > 2) do JEP65 stream negotation.
> > go, implement it.
> 
> Seems a reasonable starting point.
> 
> >i really give favor to server based approaches.
> 
> As you have indicated previously, but just because you dont think it useful
> or a good idea doesnt mean it isnt and other people dont want it. Im not
> against server based approaches (infact its a very good idea) but there is
> plenty of room and requirement for both options, you really shouldnt be
> required to go via a server for a quick couple of minute two person chat.
> 
> Richard
> 
> _______________________________________________
> jdev mailing list
> [EMAIL PROTECTED]
> https://jabberstudio.org/mailman/listinfo/jdev
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFARbcTmhQsAkXAJP0RApomAJ9c74BSNE6Uj1XuQDhkFzwIg5if0QCeMgnF
vkikBCqftEUtYuCBQf3vZR0=
=5dtJ
-----END PGP SIGNATURE-----
_______________________________________________
jdev mailing list
[EMAIL PROTECTED]
https://jabberstudio.org/mailman/listinfo/jdev

Re: [jdev] voicechat again

Reply via email to