[Drizzle-discuss] Regarding Libdrizzle Native Sharding

Abhishek Singh Tue, 05 Apr 2011 19:05:41 -0700

Hi,

Here is brief overview of the approach that I am following for this proposal

*Some Common Sharding Schemes:*

There are a number of different schemes one could use to decide how to break
up an application database into multiple smaller DBs. Below are four of the
most popular schemes used by various large scale Web applications today.

*1.Vertical Partitioning:* A simple way to segment application database is
to move tables related to specific features to their own server. The key
benefit of this approach is that is straightforward to implement and has low
impact to the application as a whole. The main problem with this approach is
that if the site experiences additional growth then it may be necessary to
further shard a feature specific database across multiple servers.

*2.Range Based Partitioning:* In situations where the entire data set for a
single feature or table still needs to be further subdivided across multiple
servers, it is important to ensure that the data is split up in a
predictable manner. One approach to ensuring this predictability is to split
the data based on values ranges that occur within each entity. The main
problem with this approach is that if the value whose range is used for
partitioning isn't chosen carefully then the sharding scheme leads to
unbalanced servers.

*3.Key or Hash Based Partitioning:* This is often a synonym for user based
partitioning for Web 2.0 sites. With this approach, each entity has a value
(what sort of value is to chosen is described in " Determination of the
optimum method for sharding the data" ) that can be used as input into a
hash function whose output is used to determine which database server to
use.

*4.Directory Based Partitioning:* A loosely coupled approach to this problem
is to create a lookup service which knows your current partitioning scheme
and abstracts it away from the database access code. This means the
GetDatabaseFor( ) method actually hits a web service or a database which
actually stores/returns the mapping between each entity key and the database
server it resides on. This loosely coupled approach means you can perform
tasks like adding servers to the database pool or change your partitioning
scheme without having to impact your application.

So out of these *I am going to choose 3rd Option*, as this is most popular
and robust option ( Also Brian has worked on similar lines in libhashkit).

*Determination of the optimum method for sharding the data:*

It is closely related with selection of the database shard scheme. This data
is required to determine the optimum sharding strategy:

*1.Shard by the primary key on a table:* This is the most straight forward
option, and easiest to map to a given application. However, this is only
effective if your data is reasonably well distributed. For example, if you
are elected to shard by customerID and most of your new customers, very
little if anything will be gained by sharding your db. On the other hand, if
you can select a key that does adequately and naturally distribute your
transactions, great benefits can be realized.

*2.Shard by the modulus of a key value:* This option works in a vast number
of cases, by applying the modulus function to the key value, and
distributing transactions based on the calculated value. In essence you can
predetermine any number of shards, and the modulus function effectively
distributes across your shards on a “round-robin” basis, creating a very
even distribution of new key values.

*3.Maintain a master shard index:* This technique involves using a single
master table that maps various values to specific shards. It is very
flexible, and meets a wide variety of application situations. However, this
option often delivers lower performance as it requires an extra lookup for
each sharded SQL Statement.

In my opinion, *determination of the optimum method for sharding data should
be given as choice for the clients*, since they would understand their
transaction rates , table volumes, key distribution and
other characteristics of their application. *And 3rd option should be kept
as the default one.*

I would like to know response from community regarding the best approach to
follow.

*Hashing Algorithm for key based partioning:*

There are basically *3 options* available for us:

*(a) The naive approach* (where the number of servers never change): This
would do a modulo based on the size of the server set. For a given key, this
would return the same, random server from the set everytime.

*server_index = serverlist[hash(key)%serverlist.length];*

*(b) Using ketama or weighted ketama:* A consistent hashing algo for
memcache clients. In his case one can add and remove servers from memcached
pool pool without causing a complete remap of all keys.(libhashkit is based
on similar thought)

*(c) Using the concept of vbucket style partioning algorithm*: Here is nice
article on vbuckets implementation details and their advantages
[1]<http://techzone.couchbase.com/wiki/display/membase/vBuckets>

This concept doesn't involve any sort of proxies, location services,
server-to-server knowledge. So in gist, a vbucket aware request requires no
more network operations to find the data to perform the requested
operations.

*server_index = serverlist[vbuckets[hash(key) % vbuckets.length]] ;*

Out of these three *I am in favor of 3rd option*. The advantages that it has
over 2nd option ( Weighted Ketama has already been implemented in
libhashkit) are:

- Never service a request on the wrong server.
- Servers refuse commands that they should not service.
- Servers still do not know about each other.
- We can hand data sets from one server another atomically.
- There are no temporal constraints involved.
- Consistency is absolutely guaranteed.
- Absolutely no network overhead is introduced in the normal case.

*I have started implementation of vbucket style partioning. Here in-case of
vbucket scheme, I will be borrowing code from libhashkit for hashing clients
to different vbuckets.* And then vbuckets are statically mapped to different
servers.

*Steps for handling Failures:*

For this case we have *three different strategies:*

- *1:n Replication:* The first strategy (1:n) refers to a master
servicing multiple slaves concurrently
[2]<http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html>
.

- *Chained or Hierarchical Replication:* The second strategy (chained)
refers to a single master servicing only a single slave, but having that
slave have a further downstream slave of its own. This offers the advantage
of having a single stream of mutation events coming out of a server, while
still maintaining two copies of all records. This has the disadvantage of
compounding replication latency as you traverse the chain
[3]<http://www.8bitsofbytes.com/?p=28>
.

- *Multi - Master Replication:* It's out of the box functionality
available with Drizzle and it is still in beta phase
[4]<http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html>
.

*I am in favour of 1:n Replication Scheme* because in this case we can
control the number of slaves. And 2 copies of each server data is
probablistically reliable.

*Problems Common to all Sharding Schemes:*

Once a database has been sharded, new constraints are placed on the
operations that can be performed on the database. These constraints
primarily center around the fact that operations across multiple tables or
multiple rows in the same table no longer will run on the same server. Below
are some of the constraints and additional complexities introduced by
sharding

*1.Joins and Denormalization:* Prior to sharding a database, any queries
that require joins on multiple tables execute on a single server. Once a
database has been sharded across multiple servers, it is often not feasible
to perform joins that span database shards due to performance constraints
since data has to be compiled from multiple servers and the additional
complexity of performing such cross-server.

A common workaround is to denormalize the database so that queries that
previously required joins can be performed from a single table. Of course,
the service now has to deal with all the perils of denormalization such as
data inconsistency

*2.Referential integrity: *It worse trying to enforce data integrity
constraints such as foreign keys in a sharded database. Most relational
database management systems do not support foreign keys across databases on
different database servers. This means that applications that require
referential integrity often have to enforce it in application code and run
regular SQL jobs to clean up dangling references once they move to using
database shards.

*3.Rebalancing:* In some cases, the sharding scheme chosen for a database
has to be changed. This could happen because the sharding scheme was
improperly chosen (e.g. partitioning users by zip code) or the application
outgrows the database even after being sharded. In such cases, the database
shards will have to be rebalanced which means the partitioning scheme
changed and all existing data moved to new locations. Doing this without
incurring down time is extremely difficult.

I would really appreciate if somebody from drizzle community is willing to
share ideas regarding handling these issues.

*PseudoCode Exhibiting Sharding:* (I will be extending this part soon)

/* Start by thinking of every kind of data in your application as associated
with some entity. Give each entity a name. Then, whenever you want to have
your system find the shard for a given entity, compose this name into a URL.
For example, let's say you have data about customers. A given customer would
have an entity URL that looked like this: */

customer://1234

/* Inside the API, we have a way to access data about a given customer. I'm
not talking about high-level abstractions (like an object-oriented view of a
customer's data). I'm talking about the actual data-fetching operation. At
some level in your system, there's probably code that looks something like
this: */

$connection = new_db_connection("host", port, "dbname");

$statement = $connection->prepare($sql_statement, $params);

$result = $statement->execute();

/* What I propose is that you make your sharding decisions directly at the
"new_db_connection" seam in the code. The new code would look like this: */

$connection = new_db_connection("customer://1234");

$statement = $connection->prepare($sql_statement, $params);

$result = $statement->execute();

/* All of the complexity of determining which shard you need to use (which
we'll get to in a moment) is hidden behind that simple API. Once a
shard-lookup decision is made, there's no additional overhead; your
application is talking directly to the right database. */
I will shortly update the community regarding API and some of the initial
C++ code samples in-order to implement vbucket style
partitioning. Simultaneously I am working on the bug
[5]<https://bugs.launchpad.net/drizzle/+bug/644807>
.

[1]
http://techzone.couchbase.com/wiki/display/membase/vBuckets<http://techzone.couchbase.com/wiki/display/membase/vBuckets.>
[2]
http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html
[3] http://www.8bitsofbytes.com/?p=28
[4]
http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html<http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html%20%20>
[5] https://bugs.launchpad.net/drizzle/+bug/644807

Regards,

Abhishek Kumar Singh
http://www.mapbender.org/User:Abhishek
BE/1349/2007
Information Technology
8th SEMESTER
BIT MESRA

Skype: singhabhishek.bit
Mobile: +91-8002111189
irc-nick: sin8h

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

[Drizzle-discuss] Regarding Libdrizzle Native Sharding

Reply via email to