Hi,

Andrews I went through the 3 blueprints that you have prepared for
libdrizzle native sharding.

First blueprint ( Phase 1 of the project) [1] is basically concerned about
the choosing the hashing algorithm for mapping client to shards. Here you
have proposed to use consistent hashing algorithm similar to libketema for
memcached. I would like to do some modification in this regard. I would
propose to use vbucket style partitioning for that, which hashing ( I could
re-use ketama inside libhashkit in-case) with static mapping of vbuckets to
shards. Since this algorithm is much more robust than the probabilistically
good ketama algorithm. Some of the advantages are:

   - Service request is never sent on the wrong server.
   - Servers refuse to service commands for which they aren't responsible.
   - Servers do not know about each other.
   - Data sets can be handed from one server another atomically.
   - No temporal constraints involved.
   - Absolute Consistency guaranteed.
   - Network overhead in the normal case is nil.

Second blueprint(Phase 2 of the project) [2] asks for "re-shuffling sharded
data between between servers, when servers are either added or removed ".
In my opinion this phase is handled inside the 1st phase when we use
vbuckets. In-case of vbuckets we have replication phase where vbuckets are
re-mapped to newly added servers.

Third blueprint( Phase 3 of the project) [3] asks for "support groups of
servers so that each shard could contain a master and multiple slaves for
round-robin read performance and redundancy ".
So this Phase work has to done regarding failure handling. There are
basically 3 ways to have master-slave replication in-case of Drizzle:

   - 1:n Replication: The first strategy (1:n) refers to a master servicing
   multiple slaves concurrently [4].


   - Chained Replication: The second strategy (chained) refers to a single
   master servicing only a single slave, but having that slave have a further
   downstream slave of its own. This offers the advantage of having a single
   stream of mutation events coming out of a server, while still maintaining
   two copies of all records. This has the disadvantage of compounding
   replication latency as you traverse the chain [5].


   - Multi - Master Replication: It's out of the box functionality available
   with Drizzle and it is still in beta phase [6].

Andrews I would like to ask on question in this regard. How actually would
we get the key related to each client? Clients aren't going to manually
supply key every-time they try to execute sharded query. This key has to
generated on the basis of user data automatically, and then redirecting the
client to the appropriate  shard based upon that key using vbuckets.

With some googling around I got to know some of the basic techniques used in
this regard, they are:

   - Shard by the primary value on a table: This straight-forward approach
   and easiest to implement. Here we have to calculate MD5 hash of primary key
   and there-after allocate a shard to client based upon that. This is only
   effective when data is reasonably well distributed.


   - Shard by the modulus of a key value: the modulus function effectively
   distributes across your shards on a “round-robin” basis, creating a very
   even distribution of new key values.


   - Maintain a master shard index: This technique involves using a single
   master table that maps various values to specific shards. It is very
   flexible, and meets a large variety of application situations.

I would appreciate suggestions in this regard.

[1]
https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase1
[2]
https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase2
[3]
https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase3
[4]
http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html
[5] http://www.8bitsofbytes.com/?p=28
[6]
http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html

Regards,

Abhishek Kumar Singh
http://www.mapbender.org/User:Abhishek
BE/1349/2007
Information Technology
8th SEMESTER
BIT MESRA

Skype: singhabhishek.bit
Mobile: +91-8002111189
irc-nick: sin8h (irc.freenode.net)



---------- Forwarded message ----------
From: Abhishek Singh <[email protected]>
Date: Wed, Apr 6, 2011 at 07:35
Subject: Regarding Libdrizzle Native Sharding
To: [email protected]


 Hi,

Here is brief overview of the approach that I am following for this proposal

*Some Common Sharding Schemes:*

There are a number of different schemes one could use to decide how to break
up an application database into multiple smaller DBs. Below are four of the
most popular schemes used by various large scale Web applications today.

*1.Vertical Partitioning:* A simple way to segment application database is
to move tables related to specific features to their own server. The key
benefit of this approach is that is straightforward to implement and has low
impact to the application as a whole. The main problem with this approach is
that if the site experiences additional growth then it may be necessary to
further shard a feature specific database across multiple servers.

*2.Range Based Partitioning:* In situations where the entire data set for a
single feature or table still needs to be further subdivided across multiple
servers, it is important to ensure that the data is split up in a
predictable manner. One approach to ensuring this predictability is to split
the data based on values ranges that occur within each entity. The main
problem with this approach is that if the value whose range is used for
partitioning isn't chosen carefully then the sharding scheme leads to
unbalanced servers.

*3.Key or Hash Based Partitioning:* This is often a synonym for user based
partitioning for Web 2.0 sites. With this approach, each entity has a value
(what sort of value is to chosen is described in " Determination of the
optimum method for sharding the data" ) that can be used as input into a
hash function whose output is used to determine which database server to
use.

*4.Directory Based Partitioning:* A loosely coupled approach to this problem
is to create a lookup service which knows your current partitioning scheme
and abstracts it away from the database access code. This means the
GetDatabaseFor( ) method actually hits a web service or a database which
actually stores/returns the mapping between each entity key and the database
server it resides on. This loosely coupled approach means you can perform
tasks like adding servers to the database pool or change your partitioning
scheme without having to impact your application.

So out of these *I am going to choose 3rd Option*, as this is most popular
and robust option ( Also Brian has worked on similar lines in libhashkit).

*Determination of the optimum method for sharding the data:*

It is closely related with selection of the database shard scheme. This data
is required to determine the optimum sharding strategy:

*1.Shard by the primary key on a table:* This is the most straight forward
option, and easiest to map to a given application. However, this is only
effective if your data is reasonably well distributed.  For example, if you
are elected to shard by customerID and most of your new customers, very
little if anything will be gained by sharding your db. On the other hand, if
you can select a key that does adequately and naturally distribute your
transactions, great benefits can be realized.

*2.Shard by the modulus of a key value:* This option works in a vast number
of cases, by applying the modulus function to the key value, and
distributing transactions based on the calculated value. In essence you can
predetermine any number of shards, and the modulus function effectively
distributes across your shards on a “round-robin” basis, creating a very
even distribution of new key values.

*3.Maintain a master shard index:* This technique involves using a single
master table that maps various values to specific shards. It is very
flexible, and meets a wide variety of application situations. However, this
option often delivers lower performance as it requires an extra lookup for
each sharded SQL Statement.

In my opinion, *determination of the optimum method for sharding data should
be given as choice for the clients*, since they would understand their
transaction rates , table volumes, key distribution and
other characteristics of their application. *And 3rd option should be kept
as the default one.*

I would like to know response from community regarding the best approach to
follow.

*Hashing Algorithm for key based partioning:*

There are basically *3 options* available for us:

*(a) The naive approach* (where the number of servers never change): This
would do a modulo based on the size of the server set. For a given key, this
would return the same, random server from the set everytime.

*server_index = serverlist[hash(key)%serverlist.length];*

*(b) Using ketama or weighted ketama:* A consistent hashing algo for
memcache clients. In his case one can add and remove servers from memcached
pool pool without causing a complete remap of all keys.(libhashkit is based
on similar thought)

*(c) Using the concept of vbucket style partioning algorithm*: Here is nice
article on vbuckets implementation details and their advantages
[1]<http://techzone.couchbase.com/wiki/display/membase/vBuckets>

This concept doesn't involve any sort of proxies, location services,
server-to-server knowledge. So in gist, a vbucket aware request requires no
more network operations to find the data to perform the requested
operations.

*server_index = serverlist[vbuckets[hash(key) % vbuckets.length]] ;*

Out of these three *I am in favor of 3rd option*. The advantages that it has
over 2nd option ( Weighted Ketama has already been implemented in
libhashkit) are:

   - Never service a request on the wrong server.
   - Servers refuse commands that they should not service.
   - Servers still do not know about each other.
   - We can hand data sets from one server another atomically.
   - There are no temporal constraints involved.
   - Consistency is absolutely guaranteed.
   - Absolutely no network overhead is introduced in the normal case.

*I have started implementation of vbucket style partioning. Here in-case of
vbucket scheme, I will be borrowing code from libhashkit for hashing clients
to different vbuckets.* And then vbuckets are statically mapped to different
servers.

*Steps for handling Failures:*

For this case we have *three different strategies:*

   - *1:n Replication:* The first strategy (1:n) refers to a master
   servicing multiple slaves concurrently
[2]<http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html>
   .


   - *Chained or Hierarchical Replication:* The second strategy (chained)
   refers to a single master servicing only a single slave, but having that
   slave have a further downstream slave of its own. This offers the advantage
   of having a single stream of mutation events coming out of a server, while
   still maintaining two copies of all records. This has the disadvantage of
   compounding replication latency as you traverse the chain
[3]<http://www.8bitsofbytes.com/?p=28>
   .


   - *Multi - Master Replication:* It's out of the box functionality
   available with Drizzle and it is still in beta phase
[4]<http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html>
   .

*I am in favour of 1:n Replication Scheme* because in this case we can
control the number of slaves. And 2 copies of each server data is
probablistically reliable.

*Problems Common to all Sharding Schemes:*

Once a database has been sharded, new constraints are placed on the
operations that can be performed on the database. These constraints
primarily center around the fact that operations across multiple tables or
multiple rows in the same table no longer will run on the same server. Below
are some of the constraints and additional complexities introduced by
sharding

*1.Joins and Denormalization:* Prior to sharding a database, any queries
that require joins on multiple tables execute on a single server. Once a
database has been sharded across multiple servers, it is often not feasible
to perform joins that span database shards due to performance constraints
since data has to be compiled from multiple servers and the additional
complexity of performing such cross-server.

A common workaround is to denormalize the database so that queries that
previously required joins can be performed from a single table. Of course,
the service now has to deal with all the perils of denormalization such as
data inconsistency

*2.Referential integrity: *It worse trying to enforce data integrity
constraints such as foreign keys in a sharded database. Most relational
database management systems do not support foreign keys across databases on
different database servers. This means that applications that require
referential integrity often have to enforce it in application code and run
regular SQL jobs to clean up dangling references once they move to using
database shards.

*3.Rebalancing:* In some cases, the sharding scheme chosen for a database
has to be changed. This could happen because the sharding scheme was
improperly chosen (e.g. partitioning users by zip code) or the application
outgrows the database even after being sharded. In such cases, the database
shards will have to be rebalanced which means the partitioning scheme
changed and all existing data moved to new locations. Doing this without
incurring down time is extremely difficult.

I would really appreciate if somebody from drizzle community is willing to
share ideas regarding handling these issues.

*PseudoCode Exhibiting Sharding:* (I will be extending this part soon)

/* Start by thinking of every kind of data in your application as associated
with some entity. Give each entity a name. Then, whenever you want to have
your system find the shard for a given entity, compose this name into a URL.
For example, let's say you have data about customers. A given customer would
have an entity URL that looked like this: */

customer://1234

/* Inside the API, we have a way to access data about a given customer. I'm
not talking about high-level abstractions (like an object-oriented view of a
customer's data). I'm talking about the actual data-fetching operation. At
some level in your system, there's probably code that looks something like
this: */


$connection = new_db_connection("host", port, "dbname");

$statement = $connection->prepare($sql_statement, $params);

$result = $statement->execute();


/* What I propose is that you make your sharding decisions directly at the
"new_db_connection" seam in the code. The new code would look like this: */


$connection = new_db_connection("customer://1234");

$statement = $connection->prepare($sql_statement, $params);

$result = $statement->execute();

/* All of the complexity of determining which shard you need to use (which
we'll get to in a moment) is hidden behind that simple API. Once a
shard-lookup decision is made, there's no additional overhead; your
application is talking directly to the right database. */
I will shortly update the community regarding API  and some of the initial
C++ code samples in-order to implement vbucket style
partitioning. Simultaneously I am working on the bug
[5]<https://bugs.launchpad.net/drizzle/+bug/644807>
.

[1] 
http://techzone.couchbase.com/wiki/display/membase/vBuckets<http://techzone.couchbase.com/wiki/display/membase/vBuckets.>
[2]
http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html
[3] http://www.8bitsofbytes.com/?p=28
[4]
http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html<http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html%20%20>
[5] https://bugs.launchpad.net/drizzle/+bug/644807

Regards,

Abhishek Kumar Singh
http://www.mapbender.org/User:Abhishek
BE/1349/2007
Information Technology
8th SEMESTER
BIT MESRA

Skype: singhabhishek.bit
Mobile: +91-8002111189
irc-nick: sin8h
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to