Re: [Drizzle-discuss] Regarding Libdrizzle Native Sharding

Abhishek Singh Thu, 07 Apr 2011 15:00:21 -0700

On Wed, Apr 6, 2011 at 13:56, Andrew Hutchings <[email protected]>wrote:


> Hi Abhishek,
>
> On Wed, 2011-04-06 at 12:38 +0530, Abhishek Singh wrote:
>
> > First blueprint ( Phase 1 of the project) [1] is basically concerned
> > about the choosing the hashing algorithm for mapping client to shards.
> > Here you have proposed to use consistent hashing algorithm similar to
> > libketema for memcached. I would like to do some modification in this
> > regard. I would propose to use vbucket style partitioning for that,
> > which hashing ( I could re-use ketama inside libhashkit in-case) with
> > static mapping of vbuckets to shards. Since this algorithm is much
> > more robust than the probabilistically good ketama algorithm. Some of
> > the advantages are:
> >       * Service request is never sent on the wrong server.
> >       * Servers refuse to service commands for which they aren't
> >         responsible.
> >       * Servers do not know about each other.
> >       * Data sets can be handed from one server another atomically.
> >       * No temporal constraints involved.
> >       * Absolute Consistency guaranteed.
> >       * Network overhead in the normal case is nil.
>
> Sounds great to me :)
>
> > Second blueprint(Phase 2 of the project) [2] asks for "re-shuffling
> > sharded data between between servers, when servers are either added or
> > removed ".
> > In my opinion this phase is handled inside the 1st phase when we use
> > vbuckets. In-case of vbuckets we have replication phase where vbuckets
> > are re-mapped to newly added servers.
>
> Yes, I think you are correct there, as long as this is documented in a
> general use kind of way.
>
> > Third blueprint( Phase 3 of the project) [3] asks for "support groups
> > of servers so that each shard could contain a master and multiple
> > slaves for round-robin read performance and redundancy ".
> > So this Phase work has to done regarding failure handling. There are
> > basically 3 ways to have master-slave replication in-case of Drizzle:
> >       * 1:n Replication: The first strategy (1:n) refers to a master
> >         servicing multiple slaves concurrently [4].
> >       * Chained Replication: The second strategy (chained) refers to a
> >         single master servicing only a single slave, but having that
> >         slave have a further downstream slave of its own. This offers
> >         the advantage of having a single stream of mutation events
> >         coming out of a server, while still maintaining two copies of
> >         all records. This has the disadvantage of compounding
> >         replication latency as you traverse the chain [5].
> >       * Multi - Master Replication: It's out of the box functionality
> >         available with Drizzle and it is still in beta phase [6].
> > Andrews I would like to ask on question in this regard. How actually
> > would we get the key related to each client? Clients aren't going to
> > manually supply key every-time they try to execute sharded query. This
> > key has to generated on the basis of user data automatically, and then
> > redirecting the client to the appropriate  shard based upon that key
> > using vbuckets.
>
> I'm pretty sure the only way this can work is if the client supplies the
> key manually.  I don't think this is a big ask.  If it is something the
> client doesn't know the key for it will have to do a lookup on another
> table to find out what it is instead of using secondary indexes.
>
> > With some googling around I got to know some of the basic techniques
> > used in this regard, they are:
> >       * Shard by the primary value on a table: This straight-forward
> >         approach and easiest to implement. Here we have to calculate
> >         MD5 hash of primary key and there-after allocate a shard to
> >         client based upon that. This is only effective when data is
> >         reasonably well distributed.
> >       * Shard by the modulus of a key value: the modulus function
> >         effectively distributes across your shards on a “round-robin”
> >         basis, creating a very even distribution of new key values.
> >       * Maintain a master shard index: This technique involves using a
> >         single master table that maps various values to specific
> >         shards. It is very flexible, and meets a large variety of
> >         application situations.
> > I would appreciate suggestions in this regard.
>
> I'm not a fan of the third solution for this version (maybe after GSoC
> we can look into that).  For now, keep it simple so it can be built upon
> later.
>
> I was thinking a combination of the first two approaches when I dreamt
> it up... so a modulus of the MD5.  A modulus of the key value cannot
> always give a great distribution if your key always jumps by X value for
> example.  I'll explain why:
>
> Imagine a situation where you have multiple clients writing to one
> sharded table.  We can't use auto-increment because we need the key
> values to be predictable.  So for an auto-number you would likely either
> have:
>
> 1. clients pre-fetch X number pre-defined of keys from somewhere to use
> and no other client can use this set of keys (this is what MySQL Cluster
> does)
> 2. Having an predictable increment jump (so with 32 clients client 1
> would increment by 32, client 2 by 32+1, etc...)
>
> For either of these I suspect a simple modulus of the value will not
> give a perfectly even distribution.  Where as a modulus of the MD5 will
> give a much better distribution (and work better on strange data types).


Yeah, I definitely agree with you on this. Modulus of the MD5 will give much
even distribution.

In-case of vbuckets style partitioning, modulus of the decimal value of the
MD5 hash of the key supplied by client is taken over no. of vbuckets.

Image below gives a proper idea about that:
                                   [image: vbucket.png]


h(k) = hashing function employed( like MD5, CRC32).

vb [ ] = array of vbuckets to which the clients request in-order to get a
server to serve its queries out of a large no. of servers.

s [ ] = set of servers.

so h(k) => vb [index_of_vbucket] mapping involves following calculation

index_of_vbucket = ( hex2dec( MD5_hash( key_supplied_by_client ) ) %
vb.size( ) ).
So. the probabilistically this distribution is much more even.

As in above image, vb[ vbucket_index ] => s [ server_index ] mapping is
many-to-one function.

>

Kind Regards
> --
> Andrew Hutchings - LinuxJedi - http://www.linuxjedi.co.uk/
>
>

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Regarding Libdrizzle Native Sharding

Reply via email to