Hi everyone, While prototyping ideas for the PHP CQL driver, I started thinking about client responsibilities that are agnostic of the underlying transport (be it Thrift, Avro or an arbitrary binary protocol) and indeed the client's interface, e.g. Thrift API or CQL.
Drivers for most databases (e.g. MySQL, Mongo, Oracle etc.) connect to a node, specified by hostname and port. Obviously, as a distributed database, the last thing we want is clients all connecting to the same node when we can more efficiently load balance them across the cluster. At present, all high-level clients have to tackle this problem and they do so in a variety of different ways. But this is something that has nothing to do with the client API - it's a configuration concern. So my thought is: we define a standard means for clients to be configured with information about the cluster topology, we then create language abstractions for using this configuration to select the most appropriate node to connect to on-demand and, most importantly, we expose the cluster to the high-level API as a single entity that it connects to. The high-level client can focus on it's job, and the currently disparate boilerplate gets unified into something much-improved. We could also do similar things for connection fail-over policies, although that would likely be a part of the selection policy. I've already had some ideas on how best to implement such an abstraction: Using a Strategy Pattern to define different selection policies, (e.g. "Random" and "NetworkTopologyAware") would parallel the internal ReplicaPlacementStrategy's nicely in how they route client connections. I haven't really come to any concrete conclusions about this yet, but I wanted to open this up to discussion among the various client devs to see what you think. Would such an abstraction be useful? Is this something worth pursuing? -- Nick Telford