Great to know about this all...Thankx!! On Sun, Jun 20, 2010 at 10:13 AM, Bipin Gautam <[email protected]>wrote:
> Exploring the software behind Facebook, the world’s largest site > Posted in Main on June 18th, 2010 by Pingdom > > > At the scale that Facebook operates, a lot of traditional approaches > to serving web content break down or simply aren’t practical. The > challenge for Facebook’s engineers has been to keep the site up and > running smoothly in spite of handling close to half a billion active > users. This article takes a look at some of the software and > techniques they use to accomplish that. > > > Facebook’s scaling challenge > Before we get into the details, here are a few factoids to give you an > idea of the scaling challenge that Facebook has to deal with: > > * Facebook serves 570 billion page views per month (according to > Google Ad Planner). > * There are more photos on Facebook than all other photo sites > combined (including sites like Flickr). > * More than 3 billion photos are uploaded every month. > * Facebook’s systems serve 1.2 million photos per second. This > doesn’t include the images served by Facebook’s CDN. > * More than 25 billion pieces of content (status updates, > comments, etc) are shared every month. > * Facebook has more than 30,000 servers (and this number is from last > year!) > > > Software that helps Facebook scale > In some ways Facebook is still a LAMP site (kind of), but it has had > to change and extend its operation to incorporate a lot of other > elements and services, and modify the approach to existing ones. > > For example: > > * Facebook still uses PHP, but it has built a compiler for it so > it can be turned into native code on its web servers, thus boosting > performance. > * Facebook uses Linux, but has optimized it for its own purposes > (especially in terms of network throughput). > * Facebook uses MySQL, but primarily as a key-value persistent > storage, moving joins and logic onto the web servers since > optimizations are easier to perform there (on the “other side” of the > Memcached layer). > > > Then there are the custom-written systems, like Haystack, a highly > scalable object store used to serve Facebook’s immense amount of > photos, or Scribe, a logging system that can operate at the scale of > Facebook (which is far from trivial). > > But enough of that. Let’s present (some of) the software that Facebook > uses to provide us all with the world’s largest social network site. > > > Memcached > Memcached is by now one of the most famous pieces of software on the > internet. It’s a distributed memory caching system which Facebook (and > a ton of other sites) use as a caching layer between the web servers > and MySQL servers (since database access is relatively slow). Through > the years, Facebook has made a ton of optimizations to Memcached and > the surrounding software (like optimizing the network stack). > > Facebook runs thousands of Memcached servers with tens of terabytes of > cached data at any one point in time. It is likely the world’s largest > Memcached installation. > > > HipHop for PHP > HipHop for PHPPHP, being a scripting language, is relatively slow when > compared to code that runs natively on a server. HipHop converts PHP > into C++ code which can then be compiled for better performance. This > has allowed Facebook to get much more out of its web servers since > Facebook relies heavily on PHP to serve content. > > A small team of engineers (initially just three of them) at Facebook > spent 18 months developing HipHop, and it is now live in production. > > > Haystack > Haystack is Facebook’s high-performance photo storage/retrieval system > (strictly speaking, Haystack is an object store, so it doesn’t > necessarily have to store photos). It has a ton of work to do; there > are more than 20 billion uploaded photos on Facebook, and each one is > saved in four different resolutions, resulting in more than 80 billion > photos. > > And it’s not just about being able to handle billions of photos, > performance is critical. As we mentioned previously, Facebook serves > around 1.2 million photos per second, a number which doesn’t include > images served by Facebook’s CDN. That’s a staggering number. > > > BigPipe > BigPipe is a dynamic web page serving system that Facebook has > developed. Facebook uses it to serve each web page in sections (called > “pagelets”) for optimal performance. > > For example, the chat window is retrieved separately, the news feed is > retrieved separately, and so on. These pagelets can be retrieved in > parallel, which is where the performance gain comes in, and it also > gives users a site that works even if some part of it would be > deactivated or broken. > > > Cassandra > Cassandra is a distributed storage system with no single point of > failure. It’s one of the poster children for the NoSQL movement and > has been made open source (it’s even become an Apache project). > Facebook uses it for its Inbox search. > > Other than Facebook, a number of other services use it, for example > Digg. We’re even considering some uses for it here at Pingdom. > > > Scribe > Scribe is a flexible logging system that Facebook uses for a multitude > of purposes internally. It’s been built to be able to handle logging > at the scale of Facebook, and automatically handles new logging > categories as they show up (Facebook has hundreds). > > > Hadoop and Hive > Hadoop is an open source map-reduce implementation that makes it > possible to perform calculations on massive amounts of data. Facebook > uses this for data analysis (and as we all know, Facebook has massive > amounts of data). Hive originated from within Facebook, and makes it > possible to use SQL queries against Hadoop, making it easier for > non-programmers to use. > > Both Hadoop and Hive are open source (Apache projects) and are used by > a number of big services, for example Yahoo and Twitter. > Thrift > > Facebook uses several different languages for its different services. > PHP is used for the front-end, Erlang is used for Chat, Java and C++ > are also used in several places (and perhaps other languages as well). > Thrift is an internally developed cross-language framework that ties > all of these different languages together, making it possible for them > to talk to each other. This has made it much easier for Facebook to > keep up its cross-language development. > > Facebook has made Thrift open source and support for even more > languages has been added. > > Varnish > Varnish is an HTTP accelerator which can act as a load balancer and > also cache content which can then be served lightning-fast. > > Facebook uses Varnish to serve photos and profile pictures, handling > billions of requests every day. Like almost everything Facebook uses, > Varnish is open source. > Other things that help Facebook run smoothly > > We have mentioned some of the software that makes up Facebook’s > system(s) and helps the service scale properly. But handling such a > large system is a complex task, so we thought we would list a few more > things that Facebook does to keep its service running smoothly. > > > Gradual releases and dark launches > Facebook has a system they called Gatekeeper that lets them run > different code for different sets of users (it basically introduces > different conditions in the code base). This lets Facebook do gradual > releases of new features, A/B testing, activate certain features only > for Facebook employees, etc. > > Gatekeeper also lets Facebook do something called “dark launches”, > which is to activate elements of a certain feature behind the scenes > before it goes live (without users noticing since there will be no > corresponding UI elements). This acts as a real-world stress test and > helps expose bottlenecks and other problem areas before a feature is > officially launched. Dark launches are usually done two weeks before > the actual launch. > Profiling of the live system > > Facebook carefully monitors its systems (something we here at Pingdom > of course approve of), and interestingly enough it also monitors the > performance of every single PHP function in the live production > environment. This profiling of the live PHP environment is done using > an open source tool called XHProf. > > > Gradual feature disabling for added performance > If Facebook runs into performance issues, there are a large number of > levers that let them gradually disable less important features to > boost performance of Facebook’s core features. > The things we didn’t mention. > > We didn’t go much into the hardware side in this article, but of > course that is also an important aspect when it comes to scalability. > For example, like many other big sites, Facebook uses a CDN to help > serve static content. And then of course there is the huge data center > Facebook is building in Oregon to help it scale out with even more > servers. > > And aside from what we have already mentioned, there is of course a > ton of other software involved. However, we hope we were able to > highlight some of the more interesting choices Facebook has made. > > > Facebook’s love affair with open source > We can’t complete this article without mentioning how much Facebook > likes open source. Or perhaps we should say, “loves”. > > Not only is Facebook using (and contributing to) open source software > such as Linux, Memcached, MySQL, Hadoop, and many others, it has also > made much of its internally developed software available as open > source. > > Examples of open source projects that originated from inside Facebook > include HipHop, Cassandra, Thrift and Scribe. Facebook has also > open-sourced Tornado, a high-performance web server framework > developed by the team behind FriendFeed (which Facebook bought in > August 2009). > > (A list of open source software that Facebook is involved with can be > found on Facebook’s Open Source page.) > > ... > (Read More: > http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/ ) > > > ----------------------------------- > You received this message because you are subscribed to the Google > Groups "NepSecure (Nepali computer security and hacking community )" > group. > > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<nepsecure%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/nepsecure?hl=en. > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: > [email protected]<foss-nepal%[email protected]> > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/
