Re: [JDEV] Videoconferencing with jabber / Re:[speex-dev]Videoconferencing with speex and jabber

Tijl Houtbeckers Mon, 01 Dec 2003 11:22:45 -0800

Let me start off by apologizing to everyone on the list who is following this discussion for my horrible spelling lately. This discussion takes up a little more time than I actually have right now, but I'd still like to sort thing out a litte before I go on vacation dec. 5th.

On Mon, 1 Dec 2003 10:55:20 -0000, Richard Dobson <[EMAIL PROTECTED]> wrote:

Having one user assume the role as server, and one of client is really no harder than a model in wich you asume both are equal peers. It's simply a matter of different roles. If you can think of any reason why this is not true, please share it with the rest of us!
I dont disbute that it is any harder (for one 2 one), simply that using a client server model when a p2p model is more appropriate IMO can create more problems than it solves.

Then please point out those problems for me. I doubt you can think of any for person to person. Not at all compared to a p2p solution. And *just* implementing that is enough to participate in conferences with more than 2 persons. You get this *for free*, so to speak.

Wether you'd choose to make an extention on this for conferencing over direct links is up to you. I won't stop you. I'd encourage you if I could :)

However, using a client/model will allow you to participate in a conference on a server with more people *with no extra effort at all*. Yet you still state you don't believe it will be easyer?
Yes is easier to implement because you dont need extra p2p, but IMO its not really that much more to implement it as you will already have a large amount of the necessary code inplace once you have created a client with inbuilt server.

So why not implement a c/s based solution for person to person and server conferencing (which will take about the same effort as implementing a p2p based solution for person to person). And then implement a direct link based conferencing solution, where each node it a server as defined in the c/2 spec (which will take about the same amount of effort as doing it based on a p2p spec.)

Unless for some reason, you think the c/s spec would bring up issues, which you seemed to imply a bit back in this email.

What I *am* saying, that an entirely p2p based conferencing model (with more than 2 persons involved) is a lot more complex than a client/server model. Even more so, if you only have to implement the client portion. That's why this allows "thin" clients to still participate. It was you yourself who argued against mixing and bandwith req. on thin clients such as a pocket PC.
Yes if you only implement the client portion

You actually make a good point here. Implementing client portion + server portion (that's just suitable for talking to one person) takes about the same effort as implementing a p2p solution. But I suppose you could go for the solution of only implementing the client portion in extreme cases where resources are very limited :)

it will be a lot more work to add server or p2p, but if everyone does that (to save time and effort) your proposed system will fall apart because there will be no servers for people to connect to.

If I work on client X, ofcourse I'll implement the server portion (for single person to person chat in the least). Else client X can't talk with client X! That'd be kind of dumb. But I can imagine if I have an assigment for company to build a client that's capable of conferencing on the company servers (so they can log, etc.) I could drop the server part.

More advanced clients are likely to also implement a server that supports hosting a conference with more than 2 people. Or they'll implement a direct link conferencing extention (still based on the same protocol ofcourse). Those two are complimentary not competitive. But as pointed out by you, direct link person to person is definatly needed most, and as pointed out by other, server based conferencing is needed most too.

That doesn't mean there are no use cases left for direct link based conferencing, but IMHO not enough to justify a spec that will miss out on server based conferencing when you can get that practically for free, and will complicate the spec and raise the requirments for conferencing. Again, it's not impossible.

As Mats Bengtsson suggests I think you should take a look at this
http://www.skype.com/skype_p2pexplained.html their solution looks rather
good (although goes further than I have been suggesting)

Skype uses UDP NAT travelsal based on getting it's IP from someone outside the NAT (at least, so it was suggested either here or on SJIG), wich is currently being rejected by the jabber server folks, and if that doesn't work it uses proxies on a p2p network. Those peers on the network basically act as a proxy. So I don't quite see the relationship with your proposal.

This sort of functionality can be used with SI. For example, you could make a SI bytestream over a JXTA network. With just a bit of cheating you could probably even use the Skype network itself with SI.

maybe what we
really need to do rather than concocing our own solution is defer to the
even greater experience of someone else and just try to integrate with an
existing mechanism, just like we did with SOCKS5 for the bytestreams
mechanism.

SOCKS5 is hardly integrated with an excisting mechanism, it just uses part of the same spec. Using SI you can intergrate other solutions, almost transparently, and fall back on others if they don't work. That doesn't eliminate the need for a spec of setting up these things, and I see no good reason to not use a c/s architecture there.

I think from the discussion it's pretty obvious what's needed/wanted most are 2 things: - person to person over a direct link - conferencing with multiple persons on a server
As you realise I dont think you need to use a server to talk with a small
group of people.

You're turning a blind eye to the issues with p2p then. Other people have pointed them out, and I have. I'm not ruling out direct links conferencing at all, but after direct link person to person second most needed is server based conferencing. Both as a fall back for direct link person to person, and because in many cases (not ALL, I'm not suggesting that) that's the only *quality* way of having a conference with multiple persons. So why throw this away if we can get it, almost for free?

Again, this does not rule out what you want at all.

This can both be handeled, without overlap, with a simple JEP based on a
c/s model. P2P won't cover this, nor will it be any simpeler.

Sorry but it can handle it as I have clearly shown,

What you're talking about is simply a *different* problem. It's a solution, and a good one, but for a *different* problem. It can't handle it, and it doesn't cover it.

it wont be any simpler but IMO its not much harder if you already have client/server code in place, and is far more reliable.

Well exactly, if you have a c/s spec with c/s code in place, you can use that to implement your solution. You won't need anything p2p, it's about the direct links.

Conferencing over induvidual direct links between persons is intresting
too, but too complex to be included in the basic JEP if you ask me.

I dont think its really all that much harder as you know.

Well, with a c/s spec, client (and servers for person to person) have it very easy. Bandwith reqs are low, CPU reqs are low, and you can talk to as many persons at once as you want. Ofcourse, the req. for the server are higher (when more than 2 persons are involved). But as I pointed out, not THAT much higher as a node in direct link conference. In many use cases there WILL be more advanced implementations that on more advanced platform with more resources that can support being, and many clients that couldn't be server. But it's no issue for those clients, since they only have to be client. In many MANY cases p2p/direct link style conferencing isn't an alternative. You too, have pointed to the dailups and the pocket PC's etc. I'm sure..

Conferencing over direct links doesn't have to be p2p either. You can base it on the c/s JEP with every induvidual participant acting as a server. Not that more complex than doing this on a p2p based model.
But that is p2p is it not?

Any node (JID) in the network can be a server. This is a role in the protocol. By having this role, you can support both direct link person to person conversations, and on the server conferences. That's my point. If instead in the protocol you use the role of two equals "peers" this is disruptive.

[cut out some stuff where we pretty much agree I think]

So let's apply this to some real world situations. In how many cases are all the clients have about the same available bandwith, CPU, etc. With Joe Consumer this is unlikely.. it's a mix of dailup and broadband users. If I'd want to talk to my mother, sister and brother at the same time, I have a 1 mbit link, 1 will have a cheap DSL account, and the other 2 will be on dailup most likely.
I can see on dialup this is a problem, but as I detail below it can be complex determining the correct machine to run the server from (bandwidth available, CPU speed etc), this really needs to be automatic or we will make it that much harder for normal users to use they might well not bother and continue using MSN etc instead, we must make sure we offer something that is at least as easy as MSN Messenger and the like to use, so whichever way we go, be it client server or p2p or both all that needs to be hidden from the user, and all they should need to do is select the people they wish to chat to and click "chat".

Does MSN even *do* conferencing with more than one person? (I don't know) I think in most cases users will know who has the fastest connection, but I can imagine you'd prefer an automatic solution for this. That would be rather neat. Ofcourse when you host all this on a server component the choice is clear.

Again I don't think direct-link style conferncing is unintresting or unneeded, but it's a much more specific application than c/s conferencing. And *again*, a c/s style approach will not prevent this from being an extention.
Good, but once we have a client server system in clients we will have 90% of the code needed to implement it, it would be a mistake IMO and could prove to create a messy protocol if we dont consider how to include p2p function into the protocol we create from day one, otherwise when we extended it later it could end up either messy or we will end up duplicating lots of effort.

Agreed, when creating such a spec based on c/s, attention should be paid to allowing a direct-link conference style solution from the start. For that matter, it should also allow for things such as distrubited hosting of a conference (a sort of hybrid between direct links and c/s) or any other things people can come up with. It should just be as generic as possible.

And how's that? When 4 people talk at once, *all* client will have to mix 4 streams in the case of direct links. In the case of c/s only the server will have to mix 4 streams. Explain..
Yes but the server has to do more than simply mix the streams, it also has to re-encode the mixed streams, also if you want to remove echo's as you suggest below or be able to ignore partipants as someone has already suggested as useful functionality you need to re-mix and re-encode all outgoing streams individually, which would I expect be quite a CPU drain, but in p2p mode clients if using available technologies (directx or the equivalent) you dont even need to mix the streams as you can play simultaneous WAVE streams at the same time, also the client isnt needing to re-encode the stream to send out again.

Well, I agree that, just like with the bandwith requirments, demands on the server will be higher than on a node in a direct link conference. Just not THAT much higher, unless you want some more advanced features. There's always trade-offs between the two solutions, and at times you could prefer yours over the other. But the point I'm making is that we can have *all* of them, relativly simple with a c/s based architecture, even if a p2p spec might be just a *little* easier to work with in your case, or at least sound more logical when reading the spec.

Ofcourse you still have to mix when you use DirectX ;) Servers can use existing technology too ofcourse.. Servers (components) specializing in hosting this kind for companies or paying customers could even use DSP hardware and such.

(only thing I could think of is if you want to create a seperate mix for each client, without their own channel in it to prevent echo. Rather than mixing new streams for each client you should just surpress echo for each clients. Admitted, it increases demands on the server if you want this, but not as bad as having to mix a new stream for each client)
Not sure how you would suppress the echo of what someone said without
re-coding the streams individually to exclude that person on their own
incoming listening stream.

Well, aside from that you can surpress it client side... (which would raise the requirments for our poor pocketPC clients a little too much) I'm not an expert on audio technology but I'd imagine there are some optimizations heavy possible when making different mixes based on the same streams? I could be wrong ofcourse..

Yes, when the server quits the conference the other will get booted. If this is a big issue for you, you could devise a fallback system to another server (one of the clients for example) and still have a massivly less complex system than direct-link based conferencing. Since servers are most likely to be the best machines with the best connections this isn't such a big problem, but it's still easily solved if you want.
Good this would have to be if I were to support this, problem is tho, adding in this sort of thing brings us even closer to the requirements of just using a p2p system,

Switching to a fallback server is *definatly* something different from using your direct links system. Again, c/s and direct links based conferencing are two different soltions to two different problems, except for perhaps in the most general sense.

People on the list made it very clear direct-link style conferncing with multiple persons will not fill their most basic needs. If your only problem would be worrying about wether the host dies, I'd recommend you go with the solution I proposed rather than go direct link style. But I doubt that's your only problem :)

also would have to make it easy to start chats for normal users so the system needs to automatically determine which machine in the group is best suited to be the server and set it up as it without the user needing to do that themselves. There is also a problem with falling back in this situation in that what if there is not a machine with enough bandwidth etc left to maintain the chat?

Ofcourse this is a problem. If it won't work it won't work. If your solution *would* work in that case, well that's why I think it would be great to have. However, don't overestimate how often this will be the case. But it's definatly so on XBox Live.. which is still a brilliant example :)

It will go down, which it shouldnt
in p2p because all nodes will require the same amount of bandwidth to
maintain it and it should keep going.

When there are a few clients with bad connections in the conversation
reliability will probably improve a bit too. Bad connection <-> Good
connection <-> bad connection is generally more reliable than bad
connection <-> bad connection. Escp. when you consider bandwith usage
drops too.


Yup but there is no real way without user intervention to make sure the
server is on a reliable connection, but we need to make it as easy as
possible otherwise normal people would not know what to do.

You could automate this.. (and use a remote control protocol to set everything up transparantly) but I don't think user intervention is a bad thing here *necisarly*. Even the most oblivious of users know broadband is better than dailup..

Latency is an intresting case, but in practise the results would probably surprise you. Because on low-bandwith nodes to bandwith requirments dramatically drop when they act as a client rather than a node in the direct link conference, latency in many cases will actually improve in a lot of cases!
Thats good but do you have any real evidence of this?

I assume you have no problems with the idea that latency is lower on low-bandwith connections when the bandwith used is lower too? If not.. just play an online game, then exit it, turn on some filesharing network, and play the game again ;) That's just simple maths!

Even on my old "broadband" connection, where I had 15 KB/s upstream availably, latency would jump from about 25-40ms to 50-400ms if I used only 10KB/s of it for different purposes.

Gaming provides another example.. in the old days when I played Quake, I'd be a lot faster to play on my ISPs server with someone, then for either of us to host the server (latency would be higher and less reliable there). Experiance in using the old ICQ protocol gave me the same idea, even though the amounts of data are *very* limited there.

If latency is your main point for choosing direct link conferencing, I'd be very carefull if I were you cause the result might dissappoint in many cases.

So you can have the situation where a node in a direct-link conference with 3 persons talking is barely able to keep up, with horrible latency. While a client with the exact same quality connection is enjoying a conference where 6 people are talking with lower latency! (it wouldn't even be able to participate when 6 people are talking in a direct link conference).
You would have to have very low bandwidth to not be able to talk to those 6 people tho in p2p, but yea that could be a problem, but one of the people still needs to be on a good connection.

If you dedicate all your bandwith to voice chat, use low quality codecs, have at least a 56k6 on a decent ISP then you can probably talk to more than just a few. But that's hardly always the case. And still, the more streams will be active, the higher latency will get, and less reliability in some cases.

Now lets talk about out-of-sync mixing. With direct-link based conferences every client will produce a different "mix" based on the latency / bandwith of their connections, and that of the other nodes. This means when we're in a meeting, for me it can sound like 3 people were talking at once, while for you it can sound like they didn't at all. (that means I didn't hear what they said and I'll ask them to repeat, while you'll be annoyed with me (even more ;) cause for you it sounded like I could have heard perfectly).
Sure that could be a problem, but its a problem people will be used to if
they have ever made long distance phone calls, this sort of thing is the
least of our worries IMO.

This problem doesn't occur when you make long distance phonecalls..??? How could it? It doesn't even happen in a long distance *conference* call!

With a serverside solution *everyone* will receive the same audiostream (with perhaps only their own stream emitted). With direct links every client makes their own "mix".

Let's pretend you and I are in a conversation with person A and B and C. We're useing direct links for conferencing. Person A ask a question. His stream is broadcasted induvidually to all nodes. Person B then starts to answer, and so does person C. When person B notices person C also wants to answer (they both have a fast connection so little latency) person B shuts up, and C answers. I however am on a bad link. I receive the question from A, my link with person B just went bad a little, so him starting to answer didn't make it to me yet, but already I can hear C start to give his answer. Then the link with B clears up, and in the middle of what C is saying (way after he noticed B was gonna let him do the talking) I suddenly hear B start to answer, and then stop. So I ask if C can repeat himself. But your link with B and C is just fine, you didn't hear B talk through C when he was answering at all. So you ask yourself wether I was sleeping during the meeting or something.

The more diverse your different types of connection are (unlike with XBox Live where they are all pretty much the same) the more of a problem this will be. Escp. if you use TCP sockert over an unreliable connection. This does not happen *at all* with server based conferencing.

Ofcourse there is a solution for this, syncing the mixes between nodes. But then you loose all latentcy advantages, you'll be as slow as the "weakest link". (and the weakest link will be a lot more stressed than it would be in a c/s model). Ofcourse compromises are possible..
Sure

That doesn't mean doing in-sync mixing in a direct links conference isn't still a bitch to pull of.. how do you detirmine what delay the faster nodes should add? You'll need control channels at least, and ofcourse you don't want *those* to depend on a c/s architecture either. Good luck with that ;)

But p2p chats should not need a server IMO because they are short lived sessions for which you will have already located the other members of the chat via another means (your Jabber session). Please bear in mind that client server systems are not always the best solution, just think if the file sharing systems all went through central servers the bandwidth use would be unsustanable for the server admins.

That's not anything like what I am proposing. To start with, practically all person to person communication would be over direct links. Secondly, conferences would not be held on some gigantic server, rather there will be small clusters spread all over the place.

As you might know many p2p network have made this same change, relying more on the stronger better clients, letting them take some roles that traditionally were meant for servers, Peer caches, supernodes etc. At first this was just with control info, but Skype is the next step, using "peers" as proxy servers for data. (One could argue Skype is not the first one to do it though, there's Freenet for example)

I think it'd be great if we could take the same route with Jabber (I already named a SI/JXTA based solution as an example), but without ruling out the reliable and needed c/s model either. And I think I pointed out fairly decent how we could.

Although there is the fact that current audio chat systems are mostly p2p, e.g. XBox Live, MSN Messenger, AIM, Yahoo Messenger, H.323, SIP. We need to be careful not to dismiss all that research development and reasoning that went into the decision for these people to go p2p.

With the exception of XBox Live perhaps, I wouldn't want to rely on any of them for conferencing with more than one person.

SIP and H.323 depend *can* depend on direct links for conferncing (as far as I know) that doesn't mean they have to, or even do so in a lot of cases (espc. SIP wich is often used just for replacing CSD channels!). If you're under the impression that SIP and H.323 are never used in conjunction with a "classic" phone conference you'd be very wrong I'm afraid. (as far as AIM, Yahoo, MSN I didn't even know they support conferencing, let alone how or what architecture they use for it on the protocol level)

Solutions like Net2Phone definatly connect to some server implementation, even for non-conferencing.

Maybe what we actually need to solve the low bandwidth problem of dial up users and the reliability problem of having a single point of failure is to have a hybrid client server and p2p system where the people with sufficent bandwidth run as both servers and p2p between each other (like the idea of a supernode) and the low bandwidth users connect to one of those servers, it solves the low bandwidth user problem and the reliablity problem by having multiple servers users can switch to if one goes down, and also the CPU usage problem by not having too many people all connected to one server.

In previous email I already briefly touched the subject, and some in this email. I definatly think most of this could be handeled in the SI layer though (with a little cheating), a c/s based spec will not rule this out at all. _______________________________________________ jdev mailing list [EMAIL PROTECTED] http://mailman.jabber.org/listinfo/jdev

Re: [JDEV] Videoconferencing with jabber / Re:[speex-dev]Videoconferencing with speex and jabber

Reply via email to