On Fri, Feb 20, 2009 at 1:37 PM, Stefan Karpinski <[email protected]> wrote: > > That makes sense and it clarifies one of my questions about this topic. Is > the goal of partitioned clustering to increase performance for very large > data sets, or to increase reliability? It would seem from this answere that > the goal is to increase query performance by distributing the query > processing, and not to increase reliability.
Data redundancy is taken care of orthogonally to partitioning. Each node will be able to handle maintaining N hot-failover backups. Whether the database is hosted on a single large node or partitioned among many small ones, the redundancy story is the same. Partitioning becomes useful when either the total update rate is greater than the hard-disk throughput on a single node, or the stored capacity is better managed by multiple disks. By spreading write load across nodes you can achieve greater throughput. The view queries must be sent to every node, so having docs partitioned also allows views to be calculated in parallel. It will be interesting to see if it makes sense to partition small databases across hundreds of nodes in the interested of performance. Chris -- Chris Anderson http://jchris.mfdz.com
