Dima,

What is wrong with coordinator approach? All it does is analyze small
number of TXes which wait for locks for too long.

вт, 21 нояб. 2017 г. в 1:16, Dmitriy Setrakyan <dsetrak...@apache.org>:

> Vladimir,
>
> I am not sure I like it, mainly due to some coordinator node doing some
> periodic checks. For the deadlock detection to work effectively, it has to
> be done locally on every node. This may require that every tx request will
> carry information about up to N previous keys it accessed, but the
> detection will happen locally on the destination node.
>
> What do you think?
>
> D.
>
> On Mon, Nov 20, 2017 at 11:50 AM, Vladimir Ozerov <voze...@gridgain.com>
> wrote:
>
> > Igniters,
> >
> > We are currently working on transactional SQL and distributed deadlocks
> are
> > serious problem for us. It looks like current deadlock detection
> mechanism
> > has several deficiencies:
> > 1) It transfer keys! No go for SQL as we may have millions of keys.
> > 2) By default we wait for a minute. Way too much IMO.
> >
> > What if we change it as follows:
> > 1) Collect XIDs of all preceding transactions while obtaining lock within
> > current transaction object. This way we will always have the list of TXes
> > we wait for.
> > 2) Define TX deadlock coordinator node
> > 3) Periodically (e.g. once per second), iterate over active transactions
> > and detect ones waiting for a lock for too long (e.g. >2-3 sec). Timeouts
> > could be adaptive depending on the workload and false-pasitive alarms
> rate.
> > 4) Send info about those long-running guys to coordinator in a form
> Map[XID
> > -> List<XID>]
> > 5) Rebuild global wait-for graph on coordinator and search for deadlocks
> > 6) Choose the victim and send problematic wait-for graph to it
> > 7) Victim collects necessary info (e.g. keys, SQL statements, thread IDs,
> > cache IDs, etc.) and throws an exception.
> >
> > Advantages:
> > 1) We ignore short transactions. So if there are tons of short TXes on
> > typical OLTP workload, we will never many of them
> > 2) Only minimal set of data is sent between nodes, so we can exchange
> data
> > often without loosing performance.
> >
> > Thoughts?
> >
> > Vladimir.
> >
>

Reply via email to