gruuya commented on issue #8942:
URL: 
https://github.com/apache/arrow-datafusion/issues/8942#issuecomment-1904175896

   It seems like when the entering plan's innermost projection:
   ```sql
   Projection: ?table?.id, t, CASE WHEN ?table?.id = Int32(1) THEN Int32(10) 
ELSE t END AS t2
     Projection: ?table?.id, CASE WHEN ?table?.id = Int32(1) THEN Int32(10) 
ELSE t END AS t
       Projection: ?table?.id, Int32(NULL) AS t
         TableScan: ?table?
   ```
   is being rewritten, this evaluation :
   
https://github.com/apache/arrow-datafusion/blob/2b218be67a6c412629530b812836a6cec76efc32/datafusion/optimizer/src/optimize_projections.rs#L867-L871
   concludes that its and its input schema (the bottom most projection) are 
identical, and so it just discards the projection (`proj` and its `exprs_used`) 
even though it has non-trivial computation on top.
   
   Trying out a naive solution like
   ```diff
   @@ -867,7 +867,7 @@ fn rewrite_projection_given_requirements(
        return if let Some(input) =
            optimize_projections(&proj.input, config, &required_indices)?
        {
   -        if &projection_schema(&input, &exprs_used)? == input.schema() {
   +        if &projection_schema(&input, &exprs_used)? == input.schema() && 
exprs_used.iter().all(is_expr_trivial) {
                Ok(Some(input))
            } else {
                Projection::try_new(exprs_used, Arc::new(input))
   ```
   does solve this particular problem but then it fails to eliminate unneeded 
projections in some other tests cases (notably in 
`test_infinite_source_partition_by` which ends up with a bunch of interleaved 
projections).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to