Having got the first stage of my client connector module nicely working to a single node, I'm now looking at how to make it cluster-aware, maintaining multiple connections for reliability and load-spreading. What are some good strategies to take here?
My current plan involves connecting to a (randomly chosen from a list?) seed node, to query the list of peers in the cluster, then make a selection of some number of those to be "primary" nodes, and some more as "backup" nodes. The primary nodes will be used to spread actual query load around, the backups sitting idle simply as a fast way to failover to some known-working connection if a primary falls over. By registering an interest in topology and status change messages, the client can keep the list of available nodes up-to-date. 1. What is a good way to handle prepared statements here? Should they be prepared on all the (primary/all?) nodes, or just one? Some applications I could imagine having just a handful of heavily-used prepared statements, so they'd become a hotspot on one node if it wasn't spread around. But then what to do as new nodes become elected as primaries? Should they be prepared eagerly on connection? Lazily at next use? 2. Secondly; what are suggested ways to actually spread load among the primaries? I could imagine a simple round-robin, or something more fancy involving picking the node with the fewest outstanding requests, or the one on which we've been responsible for the least processing time recently, or something else... Do client libraries generally provide a selection of these mechanisms, or just pick one? -- Paul "LeoNerd" Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/
signature.asc
Description: PGP signature