On Friday, May 25, 2001, at 08:50 AM, David Waite wrote:
IMO, mailservers should break if you try to send to more than ~20 people
at a time. Not crash, but refuse to send.
What, all mailservers?! What about the ones that run mailing lists or [voluntary] announcements? I've been on music mailing lists with thousands of subscribers, and I get very useful "what's new" mailings from Amazon that must have tens of thousands of readers.
Jabber should probably work
the same way in this case (200 users max in a roster or something
configurable like that)
Why on earth should that be a requirement? First off, it penalizes people for something that's not their fault. What happens when I can't add a new buddy because I happen to be on 195 people's rosters already?
Secondly, it ignores very useful aspects of Jabber for information delivery. Stock price agents, auction agents, news agents, etc. Like I said before, companies are slavering over the potential of this and having a viable open IM network would make it much easier to do. These things are not spam, they're voluntary since you have to subscribe to them. And they could scale to zillions of users. Just read this article:
http://biz.yahoo.com/prnews/010424/nytu114.html
"Capitol Records and Radiohead Create First Instant Message 'Buddy' in Music History"
"... The Radiohead agent will reside on a user's Instant Messenger buddy contact list. The agent will be able to recognize and respond to natural language questions and requests for information about the band and Amnesiac. Tour dates, song lists, artists' bios, album credits, purchasing information, contact information, current web site information, and other album related material will be available."
So how many Radiohead fans do you think are going to subscribe to this bot? What happens when ActiveBuddy builds ones for N'Sync or Eminem? Are you saying this kind of thing is inherently wrong and should not be supported?
Hypothetically, if you had a 10,000 user roster, that would generate
about 5 MB of XML traffic through the server it was running on everytime
the bot came online.
5MB is not really a lot of traffic for any site with a decent size pipe. It won't be happening that often since bots by design tend to stay online all the time.
Even if all of those users are on the same machine, that
would be 10,000 user rosters it would have to load up via XDB and parsed
(since the roster is also basically the presence ACL).
Yes, but you're talking about a machine hosting 10,000 users, which is going to be hellaciously busy no matter what. Presumably if a user is online their roster is already parsed, and if they're not online you don't need to do anything (since presence packets are not stored/forwarded.)
Now imagine this is a portal with a quarter of a million users, and the
bot is added by default to everyone's roster. Not only would that roster
be about 25MB, there would be at least a 35MB memory image for the DOM
tree created.
Those numbers aren't specific to having a bot, only to the size of the portal itself. It makes no difference whether all 250,000 users have the same bot in their rosters, or if their rosters all have different jids in them. In other words, you're saying that for a portal of this size, there is an average 35MB memory hit per roster entry per user -- so if the average portal user has 20 people in their roster, that's 700MB just for rosters.
(That's a scary number but on the other hand 700MB of RAM is chump change for a company big enough to run a portal this size. Something like $400? And anyway, wouldn't the load be spread across a whole farm of servers, not just one?)
If you're saying is that the present Jabber server is not scalable to this size portal, that's sort of bad news for Jabber, it sounds like, since no large scale provider would adopt it.
In any case, this is a bogus scenario. Everyone seems to keep forgetting that Jabber is supposed to be a distributed system; while there will be large portals with large numbers of users, there will be large numbers of smaller servers, as well as special purpose servers for bots. The likely scenario is that a major bot would run on its own server (or perhaps there would be a small number of bots) hosted by the company that owns it. There would not be any appreciable number of actual users on this server. So there's 25MB for the bot's roster; that's about $20 worth of RAM I think. The other side of the overhead is distributed among the host servers of all the subscribers, and has the same effect as of all the subscribers adding one more friend to their buddy list.
Moral of the story: if you try to solve every problem with a hammer and
a crowbar, you just end up breaking a lot of things ;-)
To be blunt, the real "hammer and crowbar" here seems to be the server's usage of in-memory DOM structures rather than some kind of actual database engine. Commercial databases like Oracle have no problem with the kinds of scale you're saying is impossible. (And for someone from jabber.com to be saying this sort of thing is impractical is sort of damning for the claims made about your server, btw.)
�Jens
