Robert Stupp wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Heavy stuff ;)

Your thoughts are great.

It is not too early to think about load-balancing and fail-over. The
whole application server infrastructure has to be at least prepared for
it - both a concept and a "raw" API should be there to be considered at
this early state.


I agree. Geronimo design should consider clustering from the beginning, so that this does not end up as a clumsy bolt on.



Using "real" fail-over requires other components (e.g. database
repositories) to be fail-safe, too (Oracle clusters for example).

you could spend a lot money :-)

Jules


Jules Gosnell wrote: | | I know this is a little early, but, you don't have to read it :-) | | | I've been playing around with some ideas about [web] clustering and | distribution of session state that I would like to share... | | | My last iteration on this theme (Jetty's distributable session | manager) left me with an implementation in which every node in the | cluster held a backup copy of every other nodes state. | | This is fine for small clusters, but the larger the cluster and total | state, the less it scales, because every node needs more memory and | more bandwidth to deal with holding and keeping up to date, these | backups. | | The standard way around this is to 'partition' your | cluster. i.e. break it up into lots of subclusters, which are small | enough not to display this problem. This complicates load-balancing | (as the load-balancer needs to know which nodes belong to which | partitions i.e. carry which state). Furthermore, in my implementation, | the partitions needed to be configured statically, which is a burden | in terms of time and understanding on the administrator and means that | their layout will probably end up being suboptimal or, even worse, | broken. You also need more redundancy because a static configuration | cannot react to runtime events, by say repartitioning, if a node goes | down, so you would be tempted to put more nodes in each partition | etc... | | So, I am now thinking about dynamically 'auto-partitioning' clusters | with coordinated load-balancing. | | The idea is to build a reliable underlying substrate of partitions | which reorganise dynamically as nodes leave and join the cluster. | | I have something up and running which does this OK - basically nodes | are arranged in a circle. Each node and the | however-many-nodes-you-want-in-a-partition before it, form a | partition. Partitions therefore overlap around the clock. In fact, | every node will belong to the same number of partitions as there are | members in each one. | | As a node joins the cluster, the partitions are rearranged to | accomodate it whilst maintaining this strategy. It transpires that n-1 | (where n is the number of nodes per partition) nodes directly behind | the new arrival have to pass ownership of an existing partition | membership to the new node and a new partition has to be created | encompassing these nodes and the new one. So for each new node a total | of n-1 partition releases and n partition acquisitions occurs. | | As a node leaves the cluster, exactly the same happens in | reverse. This applies until the total number of nodes in the cluster | reaches and decreases below n, in which case you can optimise the | number of partitions down to 1. | | I'm happy (currently) with this arrangement as far as a robust | underlying platform for state replication. The question is whether the | 'buckets' that state is held in should map directly to these | partitions (i.e. each partition maintains a single bucket), or whether | their granularity should be finer. | | I'm still thinking about that at the moment, but thought I would throw | my thoughts so far into the ring and see how they go.. | | If you map 1 bucket:1 partition you will end up with a very unevenly | balanced, in terms of state, cluster. Every time a node joins, | although it will acquire n-1 'buddies' and copies of their state, you | won't be able to balance-up the existing state across the cluster | because to give it some of it's own state you would need to take it | from another partition. The mapping is 1:1 so the smallest unit of | state that you can trasfer from one partition to another is it's | entire contents/bucket. | | The other extreme is 1 bucket:1 session (assuming we are talking | session state here) - but then why have buckets.. ? | | My next thought was that the number of buckets should be the square of | the number of nodes in the cluster. That way, every node carries | (where n is the total number of nodes in the cluster) n buckets. When | you add a new node you can tranfer a bucket from each existing node to | it. Each other node creates two empty buckets and the new node creates | one. State is nicely balanced etc... However, this means that the | number of buckets increases exponentially. Which smacks of something | unscalable to me and was exactly the reason that I added | partitions. In the web world, I shall probably have to map a bucket to | an e.g. mod_jk worker. (this entails appending the name of the bucket | to the session-id and telling Apache/mod_jk to associate a particular | host:port with this name. I can alter the node on which the bucket is | held, by informing Apache/mod_jk that this mapping has changed. Moving | a session from one bucket to another is problematic and probably | impossible as I would have to change the session-id that the client | holds. Whilst I could do this if I waited for a request from a single | threaded client, with a multi-threaded client it becomes much more | difficult...). If we had 25 nodes in the cluster, this would mean 625 | Apache/mod_jk workers... I'm not sure that mod_jk would like this :-). | | So, I am currently thinking about having a fixed number of buckets per | node which increases linearly with the total size of the cluster, or a | 1:1 mapping and living with the inability to immediately balance the | cluster by dynamically altering the node weightings in the | loadbalancer (mod_jk), so that the cluster soon evens out again... | | I guess that's enough for now.... | | If this is an area that interests you, please give me your thoughts | and we can massage them all together :-) | | | Jules | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/PJl5Kk+yu2LQ72QRAn7zAJsFl99kvWSwAw8CHnnV6QFcxpF75ACeIAmD
JkOgMir1GpH1CusMIBsmD/Q=
=EnaZ
-----END PGP SIGNATURE-----



--
/*************************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
* http://www.coredevelopers.net
*************************************/




Reply via email to