All nice points. I would add.. avoid transactions like the plague. Single threaded will be easier to scale out.. (think http) +1 on keep things in memory.. in fact I've customers that no longer put disks in their machines.. which surprisingly increases reliability.. (which really shouldn't be surprising).
Regards, Kirk On 2012-01-06, at 1:25 PM, Kevin Wright wrote: > It's hard to answer without more information, especially regarding data > retention requirements (hint: things can be made a lot faster if you don't > have to keep persisting them to disk) > > General principles though: > work in memory as much as possible, discs are SLOW > disk-based databases are also slow, though caching helps here. If you MUST > use a db, then consider optimised solutions. voltdb has about the best > performance going for very fast writes, monetdb is similarly impressive for > read performance with complex queries. A NoSQL solution (redis, cassandra, > etc.) may also be the best fit, depending on your use case. > architect things so you can scale by adding more machines > favour a stateless design, or enforce session affinity through your load > balancer > JSON is good, but also consider protocols buffers or MessagePack as a way or > countering serialisation overhead. Avoid XML like the plague > cache aggressively wherever it makes sense to do so. If you'll have > thousands of requests for the same resource then use Varnish, it works well > even with a sub-1s time to live. > Don't cache in the Java heap, garbage collection algorithms aren't > particularly good with such a usage pattern. memcached is a much nicer > choice. Better still, use varnish if you're able to cache at the protocol > layer. > take a look at the actor paradigm, it's a very effective way to deal with > clustering and passing messages between machines. Akka 2.0 is shaping up to > be very powerful in this area. > don't lose track of the need to balance performance vs time to market. You > can always find a way to make things faster given an infinite time budget, > but that never happens in the real world. > As for case studies... > Facebook lean heavily on cassandra and hadoop to do much of their heavy > lifting, they've also made massive investments in speeding up and compiling > PHP, which suggests that it probably wasn't the best initial choice of > language they could have made for their front-end. > > Twitter, famously, got a significant speed boost by dropping a lot of Ruby > code from their perfomance-critical systems, replacing it with Scala instead. > They also implemented their own graph database. > and, yes, LMAX and the disruptor pattern is nothing short of amazing. > > > On 6 January 2012 11:28, Rakesh <[email protected]> wrote: > Hi guys, > > I was wondering if you guys could educate me or at least point me to > some useful resources. > > Lets say I was tasked with architecting a web application where I was > expecting huge volumes of transactions, circa millions of transactions > in a small hour or so window at peak times. > > I could do a traditional n-tier architecture with the web at one end, > business/service layer in the middle and a big database at the other > end. Perhaps even do JMS between components (with Active MQ). > > Would that be up to the job? What if it wasn't. What are my choices? > From what I know, there are 2 options: > > 1. optimise for the single threaded model - something like what LMax > has done (Martin Fowler has a post on his blog) and try and remove the > DB from the loop. I (think) this also includes software transactional > memory-type architectures? > > 2. explicitly move to a multi-threaded model. > > Is that roughly the options? What do Facebook and Twitter do to manage > the huge load? > > All feedback welcome. > > Cheers > > R > > -- > You received this message because you are subscribed to the Google Groups > "The Java Posse" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/javaposse?hl=en. > > > > > -- > Kevin Wright > mail: [email protected] > gtalk / msn : [email protected] > quora: http://www.quora.com/Kevin-Wright > google+: http://gplus.to/thecoda > twitter: @thecoda > vibe / skype: kev.lee.wright > steam: kev_lee_wright > > "My point today is that, if we wish to count lines of code, we should not > regard them as "lines produced" but as "lines spent": the current > conventional wisdom is so foolish as to book that count on the wrong side of > the ledger" ~ Dijkstra > > > -- > You received this message because you are subscribed to the Google Groups > "The Java Posse" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/javaposse?hl=en. -- You received this message because you are subscribed to the Google Groups "The Java Posse" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.
