On Tue, 17 Jan 2006, Rajith Attapattu wrote: > Can u guys talk more about locking mechanisms pros and cons wrt in memory > replication and storaged backed replication.
I don't know what you have in mind here by 'storage-backed'. > Also what if a node goes down while the lock is aquirred?? I assume there is > a time out. Which architecture do you have in mind here? I think the question is relevant if you use a standalone lock server, but if you don't then you just put the lock queue with the data item in question. > When it comes to partition (either network/power failure or vistual) or > healing (same new nodes comming up as well??) what are some of the > algorithms and stratergies that are widely used to handle those situations > ?? any pointers will be great. I believe the best strategy depends on what type of state the application has. Clearly if the state took zero time to transfer over you could compare version numbers, transfer the state to the nodes that happen to be out-of-date, and you are back in business. OTOH if the state is 1Gb you will take a different approach. There is not much to look up here. Think about it carefull and you can come up with the best state transfer for your application. Session state is easier than others because it consists of miryads small, independent data items that do not support concurrent access. > so if u are in the middle of filling a 10 page application on the web and > while in the 9th page and the server goes down, if you can restart again > with the 7 or 8th page (a resonable percentage of data was preserved through > merge/split/change) I guess it would be tolarable if not excellent in a very > busy server. Since this is a question about availability consider a cluster, say 4 nodes, with a minimum R=2, say, where all the sessions are replicated on _each_ node. If you want to guarantee that the user's work is _never_ lost, just send all session updates to yourself in a totem-protocol 'safe' message, which is delivered only after the message has been received (but not delivered) by all the nodes, and wait for your own message to arrive. This takes between 1 and 2 token rotations, which on 4 nodes I guess would be between 10-20 milliseconds, which is not a lot as http request latencies go. As a result of this after an http request returns, the work done is likely to survive node crashes up to 4 - R = 2 node crashes.
