[ 
https://issues.apache.org/jira/browse/IGNITE-24995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin reassigned IGNITE-24995:
-----------------------------------------

    Assignee: Pavel Pereslegin

> Sql. Rework correlates serialization and propagation to another node.
> ---------------------------------------------------------------------
>
>                 Key: IGNITE-24995
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24995
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>    Affects Versions: 3.0
>            Reporter: Andrey Mashenkov
>            Assignee: Pavel Pereslegin
>            Priority: Major
>              Labels: ignite-3, performance, tech-debt
>
> *Motivation.*
> As for now, a SharedState class for storing correlates in execution context 
> and is used by CorrelatedNestedLoopJoinNode (CNLJN) execution node.
> Seems, CorrelatedNestedLoopJoinNode was designed to use batching for 
> correlates variables, to transfer many rows at a time, but implemented in 
> wrong way, and this just don't work.
> There are few related issues
> 1. The class implements Serializable interface and can be transferred to 
> another node.
> This causes using DefaultUserObjectMarshaller for class serialization in 
> messaging system. Despite the SharedState class contains BinaryTuple objects, 
> they are not converted to byte[] during serialization, which is ineffective.
> Maybe making it Externalizable could mitigate the issue.
> 2. We don't need to put a whole sql row to a correlate variable, but only 
> required row columns(projection) to reduce network pressure. 
> It is important that all the nodes creates the same projection for the same 
> correlate.
> 3. We should fix the SharedState class to make batching possible, by allowing 
> set multiple rows for the same correlate id. 
> Most likely, we must keep correlates hierarchy order to preserve CNLJN 
> collation. Correlate id number doesn't have this guarantee) in case of more 
> than one correlate.
> It may turn out that passing batches for parent correlates is useless, 
> because we can spool only child batch at a time to preserve the collation.
> Thus, SharedState maybe split or changed it's structure, to separate 
> correlates, which where received from parent fragment, and current correlates 
> to be passed to child fragment.
> *Suggestion*
> Let's improve SharedState class structure to support batching, by allowing 
> multiple rows for same correlate and resolve ordering issue (if it exists).
> Let's resolve serialization issue by adding message class for this (or use 
> externalizable at least).
> Let's avoid transferring whole rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to