Tian Jiang created IOTDB-5883:
---------------------------------
Summary: Refactor redirection and dispatching target
Key: IOTDB-5883
URL: https://issues.apache.org/jira/browse/IOTDB-5883
Project: Apache IoTDB
Issue Type: Improvement
Components: Core/Cluster
Reporter: Tian Jiang
Assignee: Tian Jiang
Fix For: master branch
Attachments: image-2023-05-16-12-22-29-563.png,
image-2023-05-16-12-42-50-397.png
The current redirection mechanism has the following issues:
1. The lower level (e.g., from the consensus layer) redirection will be
overwritten by QueryExecution. Even if the TSStatus from the lower level is
already REDIRECTION_RECOMMEND, QueryExecution will still recalculate the
redirection. Even worse, the redirection calculated may lead to a wrong node
(see the second issue for an explanation), although the client could just be
sending to the right node.
!image-2023-05-16-12-22-29-563.png|thumbnail!
2. The dispatching target and redirection target can be stale. For each
FragmentInstance, its dispatching target and redirection target is based on the
PartitionCache, and the very first node in the associated ReplicaSet is chosen
as the dispatching target and redirection target.
However, as the PartitionCache is not updated after a leadership change, the
first node in a ReplicaSet may not be the leader/primary/master node.
As a result, the FragmentInstance may be dispatched/redirected to a non-leader
node, which will incur further redirection.
Solutions:
1. QueryExection will calculate the redirection only when the TSStatus from the
lower level is REDIRECTION_RECOMMEND and it does not include a redirection node.
Such a situation is somehow rare since most REDIRECTION_RECOMMEND returned by
the lower level will include a redirection node.
2. In each ReplicaSet, an optional preferred location is recorded. When the
preferred location is set, it will be chosen as the dispatching target and
redirection target.
When REDIRECTION_RECOMMEND is returned from the lower level and a redirection
node is included, the preferred location of the ReplicaSet will be updated to
that node.
Furthermore, if the node that generates the MPP plan is in the ReplicaSet, the
FragmentInstance will not be dispatched to another node. It is because the
consensus layer has a better chance to know who the leader is than the
PartitionCache. Consequently, a consensus layer redirection is more accurate
than an MPP-level redirection.
!image-2023-05-16-12-42-50-397.png|thumbnail!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)