Re: Some thoughts on client abstraction and connection pooling

Nate McCall Fri, 01 Apr 2011 10:47:22 -0700

Thanks for bringing this up, Nick. A cleaner way to deduce cluster and
node state would be great - what is there currently is clunky, and was
just not designed for real use by clients. For example, I opened this
issue a while back while adding in auto-discovery to Hector:
https://issues.apache.org/jira/browse/CASSANDRA-1777


We have plumbed out some of the abstractions you mentio in Hector
already, and have found it helpful to be able to swap out node
selection policies as well as letting users contribute their own  (ie.
Vijay's recent patch set for phi accrual load balancing policy).

In general though, we have the benefit of using the classes in the
Cassandra jar directly to minimize our work. I could see some of this
being a lot harder to implement in other languages without such and
would therefore be supportive of any efforts here.

On Fri, Apr 1, 2011 at 10:46 AM, Nick Telford <[email protected]> wrote:
> Hi everyone,
>
> While prototyping ideas for the PHP CQL driver, I started thinking about
> client responsibilities that are agnostic of the underlying transport (be it
> Thrift, Avro or an arbitrary binary protocol) and indeed the client's
> interface, e.g. Thrift API or CQL.
>
> Drivers for most databases (e.g. MySQL, Mongo, Oracle etc.) connect to a
> node, specified by hostname and port. Obviously, as a distributed database,
> the last thing we want is clients all connecting to the same node when we
> can more efficiently load balance them across the cluster. At present, all
> high-level clients have to tackle this problem and they do so in a variety
> of different ways. But this is something that has nothing to do with the
> client API - it's a configuration concern.
>
> So my thought is: we define a standard means for clients to be configured
> with information about the cluster topology, we then create language
> abstractions for using this configuration to select the most appropriate
> node to connect to on-demand and, most importantly, we expose the cluster to
> the high-level API as a single entity that it connects to. The high-level
> client can focus on it's job, and the currently disparate boilerplate gets
> unified into something much-improved. We could also do similar things for
> connection fail-over policies, although that would likely be a part of the
> selection policy.
>
> I've already had some ideas on how best to implement such an abstraction:
> Using a Strategy Pattern to define different selection policies, (e.g.
> "Random" and "NetworkTopologyAware") would parallel the internal
> ReplicaPlacementStrategy's nicely in how they route client connections.
>
> I haven't really come to any concrete conclusions about this yet, but I
> wanted to open this up to discussion among the various client devs to see
> what you think. Would such an abstraction be useful? Is this something worth
> pursuing?
> --
> Nick Telford
>

Re: Some thoughts on client abstraction and connection pooling

Reply via email to