gruuya commented on issue #8942:
URL:
https://github.com/apache/arrow-datafusion/issues/8942#issuecomment-1904175896
It seems like when the entering plan's innermost projection:
```sql
Projection: ?table?.id, t, CASE WHEN ?table?.id = Int32(1) THEN Int32(10)
ELSE t END AS t2
Projection: ?table?.id, CASE WHEN ?table?.id = Int32(1) THEN Int32(10)
ELSE t END AS t
Projection: ?table?.id, Int32(NULL) AS t
TableScan: ?table?
```
is being rewritten, this evaluation :
https://github.com/apache/arrow-datafusion/blob/2b218be67a6c412629530b812836a6cec76efc32/datafusion/optimizer/src/optimize_projections.rs#L867-L871
concludes that its and its input schema (the bottom most projection) are
identical, and so it just discards the projection (`proj` and its `exprs_used`)
even though it has non-trivial computation on top.
Trying out a naive solution like
```diff
@@ -867,7 +867,7 @@ fn rewrite_projection_given_requirements(
return if let Some(input) =
optimize_projections(&proj.input, config, &required_indices)?
{
- if &projection_schema(&input, &exprs_used)? == input.schema() {
+ if &projection_schema(&input, &exprs_used)? == input.schema() &&
exprs_used.iter().all(is_expr_trivial) {
Ok(Some(input))
} else {
Projection::try_new(exprs_used, Arc::new(input))
```
does solve this particular problem but then it fails to eliminate unneeded
projections in some other tests cases (notably in
`test_infinite_source_partition_by` which ends up with a bunch of interleaved
projections).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]