Hi, Andrews I went through the 3 blueprints that you have prepared for libdrizzle native sharding.
First blueprint ( Phase 1 of the project) [1] is basically concerned about the choosing the hashing algorithm for mapping client to shards. Here you have proposed to use consistent hashing algorithm similar to libketema for memcached. I would like to do some modification in this regard. I would propose to use vbucket style partitioning for that, which hashing ( I could re-use ketama inside libhashkit in-case) with static mapping of vbuckets to shards. Since this algorithm is much more robust than the probabilistically good ketama algorithm. Some of the advantages are: - Service request is never sent on the wrong server. - Servers refuse to service commands for which they aren't responsible. - Servers do not know about each other. - Data sets can be handed from one server another atomically. - No temporal constraints involved. - Absolute Consistency guaranteed. - Network overhead in the normal case is nil. Second blueprint(Phase 2 of the project) [2] asks for "re-shuffling sharded data between between servers, when servers are either added or removed ". In my opinion this phase is handled inside the 1st phase when we use vbuckets. In-case of vbuckets we have replication phase where vbuckets are re-mapped to newly added servers. Third blueprint( Phase 3 of the project) [3] asks for "support groups of servers so that each shard could contain a master and multiple slaves for round-robin read performance and redundancy ". So this Phase work has to done regarding failure handling. There are basically 3 ways to have master-slave replication in-case of Drizzle: - 1:n Replication: The first strategy (1:n) refers to a master servicing multiple slaves concurrently [4]. - Chained Replication: The second strategy (chained) refers to a single master servicing only a single slave, but having that slave have a further downstream slave of its own. This offers the advantage of having a single stream of mutation events coming out of a server, while still maintaining two copies of all records. This has the disadvantage of compounding replication latency as you traverse the chain [5]. - Multi - Master Replication: It's out of the box functionality available with Drizzle and it is still in beta phase [6]. Andrews I would like to ask on question in this regard. How actually would we get the key related to each client? Clients aren't going to manually supply key every-time they try to execute sharded query. This key has to generated on the basis of user data automatically, and then redirecting the client to the appropriate shard based upon that key using vbuckets. With some googling around I got to know some of the basic techniques used in this regard, they are: - Shard by the primary value on a table: This straight-forward approach and easiest to implement. Here we have to calculate MD5 hash of primary key and there-after allocate a shard to client based upon that. This is only effective when data is reasonably well distributed. - Shard by the modulus of a key value: the modulus function effectively distributes across your shards on a “round-robin” basis, creating a very even distribution of new key values. - Maintain a master shard index: This technique involves using a single master table that maps various values to specific shards. It is very flexible, and meets a large variety of application situations. I would appreciate suggestions in this regard. [1] https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase1 [2] https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase2 [3] https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase3 [4] http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html [5] http://www.8bitsofbytes.com/?p=28 [6] http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html Regards, Abhishek Kumar Singh http://www.mapbender.org/User:Abhishek BE/1349/2007 Information Technology 8th SEMESTER BIT MESRA Skype: singhabhishek.bit Mobile: +91-8002111189 irc-nick: sin8h (irc.freenode.net) ---------- Forwarded message ---------- From: Abhishek Singh <[email protected]> Date: Wed, Apr 6, 2011 at 07:35 Subject: Regarding Libdrizzle Native Sharding To: [email protected] Hi, Here is brief overview of the approach that I am following for this proposal *Some Common Sharding Schemes:* There are a number of different schemes one could use to decide how to break up an application database into multiple smaller DBs. Below are four of the most popular schemes used by various large scale Web applications today. *1.Vertical Partitioning:* A simple way to segment application database is to move tables related to specific features to their own server. The key benefit of this approach is that is straightforward to implement and has low impact to the application as a whole. The main problem with this approach is that if the site experiences additional growth then it may be necessary to further shard a feature specific database across multiple servers. *2.Range Based Partitioning:* In situations where the entire data set for a single feature or table still needs to be further subdivided across multiple servers, it is important to ensure that the data is split up in a predictable manner. One approach to ensuring this predictability is to split the data based on values ranges that occur within each entity. The main problem with this approach is that if the value whose range is used for partitioning isn't chosen carefully then the sharding scheme leads to unbalanced servers. *3.Key or Hash Based Partitioning:* This is often a synonym for user based partitioning for Web 2.0 sites. With this approach, each entity has a value (what sort of value is to chosen is described in " Determination of the optimum method for sharding the data" ) that can be used as input into a hash function whose output is used to determine which database server to use. *4.Directory Based Partitioning:* A loosely coupled approach to this problem is to create a lookup service which knows your current partitioning scheme and abstracts it away from the database access code. This means the GetDatabaseFor( ) method actually hits a web service or a database which actually stores/returns the mapping between each entity key and the database server it resides on. This loosely coupled approach means you can perform tasks like adding servers to the database pool or change your partitioning scheme without having to impact your application. So out of these *I am going to choose 3rd Option*, as this is most popular and robust option ( Also Brian has worked on similar lines in libhashkit). *Determination of the optimum method for sharding the data:* It is closely related with selection of the database shard scheme. This data is required to determine the optimum sharding strategy: *1.Shard by the primary key on a table:* This is the most straight forward option, and easiest to map to a given application. However, this is only effective if your data is reasonably well distributed. For example, if you are elected to shard by customerID and most of your new customers, very little if anything will be gained by sharding your db. On the other hand, if you can select a key that does adequately and naturally distribute your transactions, great benefits can be realized. *2.Shard by the modulus of a key value:* This option works in a vast number of cases, by applying the modulus function to the key value, and distributing transactions based on the calculated value. In essence you can predetermine any number of shards, and the modulus function effectively distributes across your shards on a “round-robin” basis, creating a very even distribution of new key values. *3.Maintain a master shard index:* This technique involves using a single master table that maps various values to specific shards. It is very flexible, and meets a wide variety of application situations. However, this option often delivers lower performance as it requires an extra lookup for each sharded SQL Statement. In my opinion, *determination of the optimum method for sharding data should be given as choice for the clients*, since they would understand their transaction rates , table volumes, key distribution and other characteristics of their application. *And 3rd option should be kept as the default one.* I would like to know response from community regarding the best approach to follow. *Hashing Algorithm for key based partioning:* There are basically *3 options* available for us: *(a) The naive approach* (where the number of servers never change): This would do a modulo based on the size of the server set. For a given key, this would return the same, random server from the set everytime. *server_index = serverlist[hash(key)%serverlist.length];* *(b) Using ketama or weighted ketama:* A consistent hashing algo for memcache clients. In his case one can add and remove servers from memcached pool pool without causing a complete remap of all keys.(libhashkit is based on similar thought) *(c) Using the concept of vbucket style partioning algorithm*: Here is nice article on vbuckets implementation details and their advantages [1]<http://techzone.couchbase.com/wiki/display/membase/vBuckets> This concept doesn't involve any sort of proxies, location services, server-to-server knowledge. So in gist, a vbucket aware request requires no more network operations to find the data to perform the requested operations. *server_index = serverlist[vbuckets[hash(key) % vbuckets.length]] ;* Out of these three *I am in favor of 3rd option*. The advantages that it has over 2nd option ( Weighted Ketama has already been implemented in libhashkit) are: - Never service a request on the wrong server. - Servers refuse commands that they should not service. - Servers still do not know about each other. - We can hand data sets from one server another atomically. - There are no temporal constraints involved. - Consistency is absolutely guaranteed. - Absolutely no network overhead is introduced in the normal case. *I have started implementation of vbucket style partioning. Here in-case of vbucket scheme, I will be borrowing code from libhashkit for hashing clients to different vbuckets.* And then vbuckets are statically mapped to different servers. *Steps for handling Failures:* For this case we have *three different strategies:* - *1:n Replication:* The first strategy (1:n) refers to a master servicing multiple slaves concurrently [2]<http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html> . - *Chained or Hierarchical Replication:* The second strategy (chained) refers to a single master servicing only a single slave, but having that slave have a further downstream slave of its own. This offers the advantage of having a single stream of mutation events coming out of a server, while still maintaining two copies of all records. This has the disadvantage of compounding replication latency as you traverse the chain [3]<http://www.8bitsofbytes.com/?p=28> . - *Multi - Master Replication:* It's out of the box functionality available with Drizzle and it is still in beta phase [4]<http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html> . *I am in favour of 1:n Replication Scheme* because in this case we can control the number of slaves. And 2 copies of each server data is probablistically reliable. *Problems Common to all Sharding Schemes:* Once a database has been sharded, new constraints are placed on the operations that can be performed on the database. These constraints primarily center around the fact that operations across multiple tables or multiple rows in the same table no longer will run on the same server. Below are some of the constraints and additional complexities introduced by sharding *1.Joins and Denormalization:* Prior to sharding a database, any queries that require joins on multiple tables execute on a single server. Once a database has been sharded across multiple servers, it is often not feasible to perform joins that span database shards due to performance constraints since data has to be compiled from multiple servers and the additional complexity of performing such cross-server. A common workaround is to denormalize the database so that queries that previously required joins can be performed from a single table. Of course, the service now has to deal with all the perils of denormalization such as data inconsistency *2.Referential integrity: *It worse trying to enforce data integrity constraints such as foreign keys in a sharded database. Most relational database management systems do not support foreign keys across databases on different database servers. This means that applications that require referential integrity often have to enforce it in application code and run regular SQL jobs to clean up dangling references once they move to using database shards. *3.Rebalancing:* In some cases, the sharding scheme chosen for a database has to be changed. This could happen because the sharding scheme was improperly chosen (e.g. partitioning users by zip code) or the application outgrows the database even after being sharded. In such cases, the database shards will have to be rebalanced which means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring down time is extremely difficult. I would really appreciate if somebody from drizzle community is willing to share ideas regarding handling these issues. *PseudoCode Exhibiting Sharding:* (I will be extending this part soon) /* Start by thinking of every kind of data in your application as associated with some entity. Give each entity a name. Then, whenever you want to have your system find the shard for a given entity, compose this name into a URL. For example, let's say you have data about customers. A given customer would have an entity URL that looked like this: */ customer://1234 /* Inside the API, we have a way to access data about a given customer. I'm not talking about high-level abstractions (like an object-oriented view of a customer's data). I'm talking about the actual data-fetching operation. At some level in your system, there's probably code that looks something like this: */ $connection = new_db_connection("host", port, "dbname"); $statement = $connection->prepare($sql_statement, $params); $result = $statement->execute(); /* What I propose is that you make your sharding decisions directly at the "new_db_connection" seam in the code. The new code would look like this: */ $connection = new_db_connection("customer://1234"); $statement = $connection->prepare($sql_statement, $params); $result = $statement->execute(); /* All of the complexity of determining which shard you need to use (which we'll get to in a moment) is hidden behind that simple API. Once a shard-lookup decision is made, there's no additional overhead; your application is talking directly to the right database. */ I will shortly update the community regarding API and some of the initial C++ code samples in-order to implement vbucket style partitioning. Simultaneously I am working on the bug [5]<https://bugs.launchpad.net/drizzle/+bug/644807> . [1] http://techzone.couchbase.com/wiki/display/membase/vBuckets<http://techzone.couchbase.com/wiki/display/membase/vBuckets.> [2] http://dshrewsbury.blogspot.com/2011/03/simple-drizzle-replication-example.html [3] http://www.8bitsofbytes.com/?p=28 [4] http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html<http://dshrewsbury.blogspot.com/2011/03/multi-master-support-in-drizzle.html%20%20> [5] https://bugs.launchpad.net/drizzle/+bug/644807 Regards, Abhishek Kumar Singh http://www.mapbender.org/User:Abhishek BE/1349/2007 Information Technology 8th SEMESTER BIT MESRA Skype: singhabhishek.bit Mobile: +91-8002111189 irc-nick: sin8h
_______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

