Igniters, We are currently working on transactional SQL and distributed deadlocks are serious problem for us. It looks like current deadlock detection mechanism has several deficiencies: 1) It transfer keys! No go for SQL as we may have millions of keys. 2) By default we wait for a minute. Way too much IMO.
What if we change it as follows: 1) Collect XIDs of all preceding transactions while obtaining lock within current transaction object. This way we will always have the list of TXes we wait for. 2) Define TX deadlock coordinator node 3) Periodically (e.g. once per second), iterate over active transactions and detect ones waiting for a lock for too long (e.g. >2-3 sec). Timeouts could be adaptive depending on the workload and false-pasitive alarms rate. 4) Send info about those long-running guys to coordinator in a form Map[XID -> List<XID>] 5) Rebuild global wait-for graph on coordinator and search for deadlocks 6) Choose the victim and send problematic wait-for graph to it 7) Victim collects necessary info (e.g. keys, SQL statements, thread IDs, cache IDs, etc.) and throws an exception. Advantages: 1) We ignore short transactions. So if there are tons of short TXes on typical OLTP workload, we will never many of them 2) Only minimal set of data is sent between nodes, so we can exchange data often without loosing performance. Thoughts? Vladimir.