When you get that big, 100k's of users I bet this article will apply. http://news.cnet.com/8301-1009_3-57428067-83/fbi-we-need-wiretap-ready-web-sites-now/
What's harder, 1M users talking to each other, or logging and transmitting those chats to a third party? On Tue, May 8, 2012 at 5:36 PM, Micheil Smith <[email protected]>wrote: > No worries, I am on twitter and github as "miksago". > > 1., Doing things pull based is a possibly new way of thinking about > realtime communication, > I haven't yet seen it proven, but I think it makes sense, means that if a > server starts getting > overloaded, it'd be able to throttle incoming load and not kill the rest > of the servers in you > cluster (situation: broadcast messages). > > 2., I don't think msgpack is a protocol (in the sense of the word I was > meaning), internally, > I would be using a more structured data format, such as Protobuf, which > has a fairly strict > declaration and parser of data. Msgpack is more akin to JSON, in that it's > just a data format, > not a data protocol, it's way you use it that makes it a protocol. > > The protocols I was talking of were WebSocket Sub-Protocols, and pretty > specific to your > application or domain. > > 3., I would be going with a max of 25-75K concurrrents per server in that > case, which > would mean 16 to 40 server processes. (Most likely you'd have that 16 > segmented as 4 > servers * 4 processes, assuming 4 cores). Essentially, you want to make > the load not > incredibly high on a single server, it's better to scale out horizontally > a little bit more > than you need, and then use the high watermark on the servers as being > "burst capacity". > > That said, I would be surprised if anyone is really reach close to 500K > concurrrents on a > single application (that's a number I'd expect from a service provider of > realtime). > > As for dealing with more servers, that's where something like Apache Kafka > comes in, > however, I'm still uncertain as to using kafka. You could also go the > route of mesh > networking with ZMQ, which does work fairly well, but the setup and > development of > it is more complex. So, every server would talk to every other server. > > You don't want to be using broadcast messages if possible. As in, if you > go the pull > based setup, then each server would have a mailbox per channel on your chat > system, and servers would pull from only the servers and mailboxes that > they are > interested in. Just like if you go the route of central brokers (not that > i recommend > that), then you can structure you queues and their key spaces into segments > representing say something like "chats:{CHAT ID}", or perhaps even > "{PID / SERVER ID}:chats:{CHAT ID}", this would mean that servers would > listen > on only a subset of messages, and wouldn't get all the messages in the > system. > > (hopefully that last part makes sense, I've a bit crammed for time to > write it). > > – Micheil > > On 07/05/2012, at 12:52 PM, jason.桂林 wrote: > > > Thanks Micheil, what you said is very professional, do you have a > twitter or G+ account, I want follow you, heh. > > > > 1. What you said pull base rather than pull base, looks like a new > thinking, but I can understand why you said this, I have thought lots > about push base message broadcast, very complex. Maybe pull base will be > very simple also beautiful solution. > > > > 2. You said transport protocol, I'd like to use msgpack as protocal, but > I need help on the protocol, because msgpack is not compress on string, I > am also afraid there are some security problem. > > > > 3. " I would recommend looking into using more servers with lower load > versus fewer servers with higher load; " I'd like talk more about this, > > > > we have to use more servers for scaling, but more servers means more > complex, unlike other web applications, realtime service need communication > between servers, we have 1M users dispatched on 1K servers, 1K user on > each server, 1 user send a message in a room, this message will send to > others users. In worst case the server for sender and server for reciever > on cover all 1K server, so this message will send to all 1K server. > > > > if 100 user(10%) on each 1K servers send worst message, each server will > recieve 100K messages in same time, it's horrible. > > > > How to prevent this happen? > > > > > > 2012/5/6 Micheil Smith <[email protected]> > > If you have millions of users on line, I think you'll be facing other > problems than just > > Socket.io, some old-ish benchmarks showed socket.io maxing out at > around 5-20K > > concurrents in a single process, other websocket servers performed > differently. If > > you're serious about scaling realtime infrastructure, then you should > probably have > > a look at talks from Keeping It Realtime Conference ( > http://2011.krtconf.com/), as well > > as looking into Autobahn Test Suite benchmarks. > > > > Things to be cautious of: > > > > - You'll need a way to do load balancing (Traditional load balancers > tend to fail > > pretty hard with WebSockets or persistent Connections) > > > > - I would NOT recommend using redis or any other centralised message > bus, this > > is by far the easiest way to do scaling across multiple servers, > however, it's also > > the easiest way to shoot yourself in the foot if the message bus > goes down > > (process crash, server network isolation, etc). > > > > - I would recommend looking into using more servers with lower load > versus fewer > > servers with higher load; This will enable you to scale much better > in short bursts. > > (experience tells me that generally you'll find that your > application or service will > > have peaks and troughs in usage, generally these match up well if > the three main > > timezone blocks (US, GMT, and East Asian / Oceanic) > > > > Those points aside, getting above 100K concurrent users tends to be > incredibly hard, > > some of the largest apps around that I've seen have only just been > pushing 250K (we're > > talking like big service providers that have 500K -> 2M users, I can't > name them due > > to legal reasons). > > > > As for storage of data, you will most likely need both realtime > communication between > > servers as well as some sort of key/value store for things like presence > information and > > authentication tokens. For the storage of data, I would actually > recommend redis, it tends > > to scale out really well for master / slave type stuff. As for message > communication, I'm > > beginning to think that pull-based may be better than push based, so > something like > > Apache Kafka (not that I've had personal experience with it.) > > > > You will most likely want to also define a transport protocol on top of > your connection, > > dependent on your type of application, there aren't many resources on > doing this, but > > if you want help with that, give me a shout, I've done a lot of research > into that area over > > the last two years. > > > > Alternatively, you could look at third party services for scaling your > realtime architecture. > > At present, given the information I have on various services, I would be > inclined to > > recommend PubNub (http://pubnub.com), they appear to have a very high > quality setup. > > (disclaimer, I did work for a competitor in the past, but that does not > bias my choice, > > another option is Pusher (http://pusher.com), or for more, you can look > here: > > http://www.leggetter.co.uk/real-time-web-technologies-guide ) > > > > Hopefully this gives some useful information or things to think about. > Scaling realtime > > architecture is kind of hard (not impossible, but can be a pain in the > ass). > > > > Regards, > > Micheil Smith > > -- > > BrandedCode.com > > > > On 06/05/2012, at 4:26 PM, jason.桂林 wrote: > > > > > Thanks Roly, it's very useful for single machine app. > > > > > > I have a real app question. If we have millions of online user, how to > computer system capacity, and how to design a architecture to fit the > capacity? > > > > > > > > > > > > 2012/5/6 Roly Fentanes <[email protected]> > > > https://github.com/fent/socket.io-clusterhub > > > > > > > > > On Sunday, May 6, 2012 4:04:30 AM UTC-7, Jason.桂林(Gui Lin) wrote: > > > I just join hackthon party, our team made a very cool chat web > application in 24 hours. > > > > > > But I know, it is a demo, It use socket.io, redis, I think it is a > little expensive on session. and it can't communicate with processes it > make it cluster. > > > > > > What nodejs could be use to? frontend server? core internal server? > > > > > > Some body said ZMQ is very fast message queue, is it help with this > case? > > > > > > > > > > > > -- > > > Best regards, > > > > > > 桂林 (Gui Lin) > > > > > > guileen@twitter > > > 桂林-V@weibo > > > guileen@github > > > > > > > > > -- > > > Job Board: http://jobs.nodejs.org/ > > > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > > > You received this message because you are subscribed to the Google > > > Groups "nodejs" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > > http://groups.google.com/group/nodejs?hl=en?hl=en > > > > > > > > > > > > -- > > > Best regards, > > > > > > 桂林 (Gui Lin) > > > > > > guileen@twitter > > > 桂林-V@weibo > > > guileen@github > > > > > > > > > -- > > > Job Board: http://jobs.nodejs.org/ > > > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > > > You received this message because you are subscribed to the Google > > > Groups "nodejs" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > > http://groups.google.com/group/nodejs?hl=en?hl=en > > > > -- > > Job Board: http://jobs.nodejs.org/ > > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > > You received this message because you are subscribed to the Google > > Groups "nodejs" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/nodejs?hl=en?hl=en > > > > > > > > -- > > Best regards, > > > > 桂林 (Gui Lin) > > > > guileen@twitter > > 桂林-V@weibo > > guileen@github > > > > -- > Job Board: http://jobs.nodejs.org/ > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > You received this message because you are subscribed to the Google > Groups "nodejs" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/nodejs?hl=en?hl=en > -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en
