HanumathRao commented on PR #3640:
URL: https://github.com/apache/calcite/pull/3640#issuecomment-1935259456

   > > The trimUnusedField ,in the prepareSql function call path, is called 
conditionally based on the configHolder's isTrimUnusedFields flag and due to 
this it results in **trimUnusedFields getting called before decorrelateQuery** 
is called.
   > 
   > Maybe I'm missing something, but isn't `decorrelateQuery` called before 
`trimUnusedFields` in `prepareSql` ? (so either we trim after decorrelate, or 
we don't trim at all, depending on the configHolder)
   > 
   > ```
   >     if (this.context.config().forceDecorrelate()) {
   >       // Sub-query decorrelation.
   >       root = root.withRel(decorrelate(sqlToRelConverter, sqlQuery, 
root.rel));
   >     }
   > 
   >     if (configHolder.get().isTrimUnusedFields()) {
   >       // Trim unused fields.
   >       root = trimUnusedFields(root);
   > 
   >       Hook.TRIMMED.run(root.rel);
   >     }
   > ```
   
   Yeah, your observation is right, and there's a subtle issue that requires 
attention in this case. 
   
   Before the decorrelate function is invoked, the plan (for the added test in 
the PR) appears as follows:
   ```
   LogicalAggregate(group=[{}], INVAMOUNT=[SUM($0)], INVCOUNT=[COUNT()])
     LogicalProject($f0=[*($2, $SCALAR_QUERY({
   LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
     LogicalProject($f0=[+($0, $1)])
       LogicalFilter(condition=[AND(IS NOT NULL($2), IS NOT NULL($3), =($4, 
$cor0.C5633_503), =($5, $SCALAR_QUERY({
   LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
     LogicalProject(C5633_2186=[$5])
       LogicalFilter(condition=[AND(=($4, $cor1.C5633_2187), =($2, 
$cor1.C5633_2184), =($3, $cor1.C5633_2185), <=($5, $cor0.C5633_485))])
         JdbcTableScan(table=[[BASEJDBC, T1704]])
   })))], variablesSet=[[$cor1]])
         JdbcTableScan(table=[[BASEJDBC, T1704]])
   }))])
       LogicalFilter(condition=[AND(<($2, 10), >($2, 0))])
         JdbcTableScan(table=[[BASEJDBC, INVOICE]])
    ```
    
   The decorrelate function before the trimFields operation is essentially a 
no-op. This is because decorrelation only occurs when a Correlate node is 
created, which is determined by a triggering check.
    
    ```
        if (!corelMap.hasCorrelation()) {
         return rootRel;
       }
   ```
   The hasCorrelation function returns true only if mapCorToCorRel is nonEmpty. 
However, in this scenario, mapCorToCorRel is empty because no Correlate node 
exists. Therefore, the decorrelate function essentially does nothing.
   
   However, after the trimUnusedFields operation is executed on the tree, it 
prunes certain fields that are essential for correlated variables. 
Consequently, a new node, LogicalProject(C5633_505=[$2]), is introduced.
   
   ```
   LogicalAggregate(group=[{}], INVAMOUNT=[SUM($0)], INVCOUNT=[COUNT()])
     LogicalProject($f0=[*($0, $SCALAR_QUERY({
   LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
     LogicalProject($f0=[+($0, $1)])
       LogicalFilter(condition=[AND(IS NOT NULL($2), IS NOT NULL($3), =($4, 
$cor0.C5633_503), =($5, $SCALAR_QUERY({
   LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
     LogicalProject(C5633_2186=[$5])
       LogicalFilter(condition=[AND(=($4, $cor1.C5633_2187), =($2, 
$cor1.C5633_2184), =($3, $cor1.C5633_2185), <=($5, $cor0.C5633_485))])
         JdbcTableScan(table=[[BASEJDBC, T1704]])
   })))], variablesSet=[[$cor1]])
         JdbcTableScan(table=[[BASEJDBC, T1704]])
   }))])
       LogicalFilter(condition=[SEARCH($0, Sarg[(0..10)])])
         LogicalProject(C5633_505=[$2]). --- a new node introduced by 
trimUnusedFields
           JdbcTableScan(table=[[BASEJDBC, INVOICE]])
   ```
   
   The subsequent optimization of this tree leads to an exception because the 
tree is no longer correct.
   
   This issue is specific to the prepareSql path and is not observed during the 
plannerImpl rel path. 
[link](https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/prepare/PlannerImpl.java#L265).
   
   Two potential solutions have been identified:
   
   1. Fix the RelTrimmer to avoid generating the problematic node.
   2. Ensure consistency between both code paths by refraining from calling the 
RelTrimmer in the prepareSql path.
   I chose the second option, as it seems logical to call the RelTrimmer only 
after the SubQueryRemovalRule has been applied.
   
   Please let me know if it is clear and thank you for reviewing the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to