On Thu, Mar 15, 2012 at 10:09 AM, Bela Ban <[email protected]> wrote: > Sorry for the delay, been busy ... >
No worries Bela, I'm late with my rewrite of the NBST document as well... > Comments inline. > > On 3/9/12 7:20 PM, Dan Berindei wrote: >> On Mar 9, 2012 4:19 PM, "Bela Ban"<[email protected]> wrote: > >> >>> My understanding (based on my changed in 4.2) is that state transfer >>> moves/deletes keys based on the diff between 2 subsequent views: >>> - Each node checks all of the affected keys >>> - If a key should be stored in additional nodes, the key is pushed there >>> - If a key shouldn't be stored locally anymore, it is removed >>> >> >> >> That's fine if we block all writes during state transfer, but once we start >> allowing writes during state transfer we need to log all changes and send >> them to the new owners at the end (the approach in 4.2 without your >> changes) or redirect all commands to the new owners. > > > OK > > >> In addition to that, we have to either block all commands on the new owners >> until they receive the entire state or to forward get commands to the old >> owners as well. The two options apply for lock commands as well. > > > Why do we have to lock at all if we use queueing of requests during a > state transfer ? > I'm not sure what you mean by queuing of requests... The current approach: * If the cache is sync, the old owners will throw a StateTransferInProgressException when a state transfer is running and the originator will then wait for state transfer to end before sending the command again to the new owners. I'm * If the cache is async, we block write commands on the target, and unblock them if after the state transfer finished. I don't think we do anything to make sure all the new owners get all the commands that were targeted to the old owners, so we can lose consistency between owners. The new design will always reject commands with the old cache view id, on any owner. If the cache view id is ok but we don't have the new CH or the lock information yet, the command will block on the target. * If the cache is sync, we can use the same approach as before and tell the originator to retry the command on the new owners. * If the cache is async, instead of doing nothing like before I was thinking that the target itself could forward the commands (especially write commands) to the new owners. We didn't discuss this in London, I added it afterwards as I realized that without forwarding we'll be losing data. I didn't realize at the time, however, that the current design has the same problem. I think I see a problem with my request forwarding plan: the requests will have a different source, so JGroups will not ensure an ordering between them and the originator's subsequent requests. This means we'll have inconsistencies anyway, so perhaps it would be better if we stuck to the current design's limitations and remove the requirement for old targets to forward commands to new ones. > >> I'm not trying to make merges more complicated on purpose :) > > > I didn't imply that; but I thought the London design was pretty simple, > and I'm trying to figure out why we have such a big (maybe perceived) > diff between London and what's in the document. > True, there are a many complications that were not apparent during our London discussion. > >>> Also, why do we need to transfer ownership information ? Can't ownership >>> be calculated purely on local information ? >>> >> >> >> The current ownership information can be calculated based solely on the >> members list. But the ownership in the previous cache view(s) can not be >> computed by joiners based only on their information, so it has to be >> broadcasted by the coordinator. > > > OK > > > >> One particularly nasty problem with the existing, blocking, state transfer >> is that before iterating the data container we need to wait for all the >> pending commands to finish. > > > Can't we queue the state transfer requests *and* the regular requests ? > When ST is done, we apply the ST requests first, then the queued regular > requests. > That was basically what we did in the blocking design: the ST commands could execute during ST, but regular commands would block until the end of the ST. With async caches, that meant we would use JGroups' 1 queue per sender (so not a global queue, but close). The problem was not with the regular commands that arrived after the start of the ST, but with the commands that had already started executing when ST started. This is the classic example: 1. A prepare command for Tx1 locks k1 on node A 2. A prepare command for Tx2 tries to acquire lock k1 on node A 3. State transfer starts up and blocks all write commands 4. The Tx1 commit command, which will unlock k1, arrives but can't run until state transfer has ended 5. The Tx2 prepare command times out on the lock acquisition after 10 seconds (by default) 6. State transfer can can now proceed and push or receive data. 7. The Tx1 commit can now run and unlock k1. It's too late for Tx2, however. The solution I had in mind for the old design was to add some kind of deadlock detection to the LockManager and throw a StateTransferInProgress when a deadlock with the state transfer is detected. With the new design I thought it would be simpler to not acquire a big lock for the entire duration of the write command that would prevent state transfer. Instead I would acquire different locks for much shorter amounts of time, and at the beginning of each lock acquisition we would just check that the command's view id is still the correct one. I guess I may have been wrong on the "simpler" assertion :) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
