Hi everyone,

While prototyping ideas for the PHP CQL driver, I started thinking about
client responsibilities that are agnostic of the underlying transport (be it
Thrift, Avro or an arbitrary binary protocol) and indeed the client's
interface, e.g. Thrift API or CQL.

Drivers for most databases (e.g. MySQL, Mongo, Oracle etc.) connect to a
node, specified by hostname and port. Obviously, as a distributed database,
the last thing we want is clients all connecting to the same node when we
can more efficiently load balance them across the cluster. At present, all
high-level clients have to tackle this problem and they do so in a variety
of different ways. But this is something that has nothing to do with the
client API - it's a configuration concern.

So my thought is: we define a standard means for clients to be configured
with information about the cluster topology, we then create language
abstractions for using this configuration to select the most appropriate
node to connect to on-demand and, most importantly, we expose the cluster to
the high-level API as a single entity that it connects to. The high-level
client can focus on it's job, and the currently disparate boilerplate gets
unified into something much-improved. We could also do similar things for
connection fail-over policies, although that would likely be a part of the
selection policy.

I've already had some ideas on how best to implement such an abstraction:
Using a Strategy Pattern to define different selection policies, (e.g.
"Random" and "NetworkTopologyAware") would parallel the internal
ReplicaPlacementStrategy's nicely in how they route client connections.

I haven't really come to any concrete conclusions about this yet, but I
wanted to open this up to discussion among the various client devs to see
what you think. Would such an abstraction be useful? Is this something worth
pursuing?
--
Nick Telford

Reply via email to