Randy Johnson - CFConcepts wrote:
> "If I had to make a site like Twitter that was scalable, how would I do 
> that?"

> But really that's only part of it.   My coldfusion site needs to be 
> coded to be scalable too..

> So writing code to be scalable doesn't seem to be all that difficult, 
> writing good efficient code in a modular way seems to be a good start.  

Modular is good, decoupling is an important part of scalability. Static 
websites scale because one webserver does not need to know anything 
about another webserver to be able to answer a request for static 
content. Doubling the number of webservers quite literally doubles your 
capacity. Transactional databases on the other hand behave very 
differently, because due to their transactional nature one server needs 
to know what another one is doing so for each request it needs to 
communicate with its peers. When it gets more peers, there is more 
communication and in the end the number of deadlocks goes up with the 
third power of the number of active nodes: 
ftp://ftp.research.microsoft.com/pub/tr/tr-96-17.pdf


> The next component is databases.  I have read with Mysql that 
> replication is how you can use multiple databases.  I haven't done to 
> much research on this, my initial questions would you use a db server 
> for users, a db server for messages, db for each component??

If you were to do that, you would need to join the users in one database 
to the messages in another one. If those databases are on different 
servers, that is incredibly slow. Even a 1 Gbit/s network is 50 times 
slower then the connection between a CPU and RAM and has 100 (1000?) 
times more latency.

A good way to make your database scale is to make sure it remains small 
and local. Stick X users on each database server together with 
everything they need to answer any request from the database that may 
arrive from the application. And user X+1 goes on a new database on the 
next piece of hardware.
I don't know much more about Twitter then you can I can see when I click 
on some page, but I guess for your Twitter example that would mean each 
database would have:
- users table
- friends table
- followers table
Whenever user X adds friend Y, you fix your application code to make 
sure that there are actually 2 inserts: in the friends table on the 
database server of user X you add name, image and URL of user Y, in the 
followers table on the database of user Y you add name, image and URL of 
user X. This is denormalized, double storage, you pay the price of 
having to run twice the updates (of hundreds of times the updates when 
somebody changes his image URL), but your thousands of gets for the 
latest update of a stream can use a simple, fast, in-memory local database.
Utility tables like languages, a list of countries etc. are present on 
all servers and are completely mirrored.

Obviously what you loose here if you look at the system as a whole are 
the A and C from ACID. For a site like twitter that would seem to be a 
reasonable price to pay, but nobody would want his bank to work this way.


> I know a few people on this list have setup Scalable websites, clusters, 
> load balancing etc. 
> 
> Where did you all learn how to do such things? 

Read a lot (I second http://highscalability.com/), ask yourself "how" or 
"why" often enough and at some point you start seeing patterns.

Jochem


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;192386516;25150098;k

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:306769
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to