Vladimir Ozerov created IGNITE-11498:
----------------------------------------
Summary: SQL: Rework DML data distribution logic
Key: IGNITE-11498
URL: https://issues.apache.org/jira/browse/IGNITE-11498
Project: Ignite
Issue Type: Task
Components: sql
Reporter: Vladimir Ozerov
Fix For: 2.8
Current DML implementation has a number of problems:
1) We fetch the whole data set to originator's node. There is
"skipDmlOnReducer" flag to avoid this in some cases, but it is still in
experimental state, and is not enabled by default
2) Updates are deadlock-prone: we update entries in batches equal to
{{SqlFieldsQuery.pageSize}}. So we can deadlock easily with concurrent cache
operations
3) We have very strange re-try logic. It is not clear why it is needed in the
first place provided that DML is not transactional and no guarantees are needed.
Proposal:
# Implement proper routing logic: if a request could be executed on data nodes
bypassing skipping reducer, do this. Otherwise fetch all data to reducer. This
decision should be made in absolutely the same way as for MVCC (see
{{GridNearTxQueryEnlistFuture}} as a starting point)
# Distribute updates to primary data node in batches, but apply them one by
one, similar to data streamer with {{allowOverwrite=false}}. Do not do any
partition state or {{AffinityTopologyVersion}} checks, since DML is not
transactional. Return and aggregate update counts back.
# Remove or at least rethink retry logic. Why do we need it in the first place?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)