On Tue, Mar 19, 2013 at 11:55 PM, Pedro Ruivo <[email protected]> wrote:
> > > On 03/19/2013 08:41 PM, Dan Berindei wrote: > > > > On Mon, Mar 18, 2013 at 6:09 PM, Pedro Ruivo <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi all, > > > > To solve ISPN-2808 (avoid blocking JGroups threads in order to allow > to > > deliver the request responses), I've created another thread pool to > move > > the possible blocking commands (i.e. the commands that may block > until > > some state is achieved). > > > > Problem description: > > > > With this solution, the new thread pool should be large in order to > be > > able to handle the remote commands without deadlocks. The problem is > > that all the threads can be block to process the command that may > > unblock other commands. > > > > Example: a bunch of commands are blocked waiting for a new topology > ID > > and the command that will increment the topology ID is in the thread > > pool queue. > > > > Solution: > > > > Use a smart command dispatcher, i.e., keep the command in the queue > > until we are sure that it will not wait for other commands. I've > already > > implemented some kind of executor service > (ConditionalExecutorService, > > in ISPN-2635 and ISPN-2636 branches, Total Order stuff) that only put > > the Runnable (more precisely a new interface called > ConditionalRunnable) > > in the thread pool when it is ready to be processed. Creative guys, > it > > may need a better name :) > > > > The ConditionalRunnable has a new method (boolean isReady()) that > should > > return true when the runnable should not block. > > > > Example how to apply this to ISPN-2808: > > > > Most of the commands awaits for a particular topology ID and/or for > lock > > acquisition. > > > > > > Well, the original problem description was about the > > DistributionInterceptor forwarding the command from the primary owner to > > the backup owners and waiting for a response from them :) > > The forwarding done by StateTransferInterceptor is also synchronous and > > can block. > > > > It's true that you can't check how long a remote call will take > > beforehand... > > > > In this way, the isReady() implementation can be something > > like: > > > > isReady() > > return commandTopologyId <= currentTopologyId && (for all keys; > do if > > !lock(key).tryLock(); return false; done) > > > > > > Shouldn't you release the locks you managed to lock already if one of > > the lock acquisitions failed? > > you are right. I have to release the locks for the pessimist mode. In > optimistic mode, the locks are only released with the rollback command. > > > > Actually you may want to release the locks even if lock acquisition did > > succeed... In non-tx mode, the locks are owned by the current thread, so > > you can't lock a key on one thread and unlock it on another (though you > > could skip the check completely in non-tx mode). And in transactional > > mode, you could have a deadlock because 2 txs lock the same keys in a > > different order. > Non-tx caches cannot use this optimization. I've seen that problem early > today when I start debugging it. > > > > Which leads me to a different point, how would you handle deadlocks? > > With pessimistic mode, if tx1 holds lock k1 and wants to acquire k2, but > > tx2 holds k2 and wants to acquire k1, will the LockCommands tx1:k2 and > > tx2:k1 ever be scheduled? In general, can we make the time that a > > Lock/PrepareCommand spends in the ConditionalExecutorService queue count > > against lockAcquisitionTimeout? > > If I got a DeadLockException, I will send it back immediately and > release all the locks. > > Hmmm, the default LockManager doesn't throw DeadlockDetectedExceptions, you have to enable deadlockDetection explicitly in the configuration. I'm guessing that's for performance reasons... It's true that you would need a new LockManager.tryLock method anyway, so you could implement deadlock detection in LockManagerImpl.tryLock. But probably the same performance considerations would apply. > The lockAcquisitionTimeout is a problem that I haven't solved yet. > > > > > With this, I believe we can keep the number of thread low and avoid > the > > thread deadlocks. > > > > Now, I have two possible implementations: > > > > 1) put a reference for StateTransferManager and/or LockManager in the > > commands, and invoke the methods directly (a little dirty) > > > > 2) added new method in the CommandInterceptor like: boolean > > preProcess<command>(Command, InvocationContext). each interceptor > will > > check if the command will block on it (returning false) or not > (invoke > > the next interceptor). For example, the StateTransferInterceptor > returns > > immediately false if the commandToplogyId is higher than the > > currentTopologyId and the *LockingIntercerptor will return false if > it > > cannot acquire some lock. > > > > Any other suggestions? If I was not clear let me know. > > > > > > TBH I would only check for the topology id before scheduling the > > commands to the Infinispan thread pool, because checking the locks is > > very complicated with all the configuration options that we have. > > > > This is how I was imagining solving the lock problem: The OOB threads > > would execute directly the commands that do not acquire locks > > (CommitCommand, TxCompletionNotificationCommand) directly and submit the > > others to the ISPN thread pool; if the command's topology id was higher > > than the current topology id, the OOB thread would just stick it in a > > queue. A separate thread would read only the commands with the current > > topology id from the queue, and process them just like an OOB thread. > > > > However... the CommitCommands may very well have a *lower* topology id > > by the time they are executed, so the OOB/executor thread may well block > > while waiting for the results of the forwarding RPC. Maybe we could work > > around it by having StateTransferInterceptor submit a task with the > > command forwarding only to the thread pool, but again it gets quite > > complicated. So for the first phase I recommend handling only the > > topology id problem. > I have a similar implementation, but in your example, the CommitCommand > will be scheduler to the thread pool only if the command topology ID is > lower or equals than the current topology ID (like all topology affected > commands) > > Currently, I only have 3 tests failing in > org.infinispan.replication.FlagsReplicationTest (2 assertions and 1 > timeout) that I'm checking now... > > > I guess it's time to look at your code :)
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
