gatesn commented on issue #2581:
URL: https://github.com/apache/datafusion/issues/2581#issuecomment-2574525606
Apologies, I should have checked the example value. 10_000 shows what I mean:
```
explain select x = cast(10000 AS int) from '/tmp/foo.parquet';
+---------------+---------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------------------------------------------+
| logical_plan | Projection: CAST(/tmp/foo.parquet.x AS Int32) =
Int32(10000) AS /tmp/foo.parquet.x = Int64(10000) |
| | TableScan: /tmp/foo.parquet projection=[x]
|
| physical_plan | ProjectionExec: expr=[CAST(x@0 AS Int32) = 10000 as
/tmp/foo.parquet.x = Int64(10000)] |
| | ParquetExec: file_groups={1 group: [[tmp/foo.parquet]]},
projection=[x] |
| |
|
+---------------+---------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.004 seconds.
```
A side note, but perhaps we're missing a rule somewhere to know that x can
never `= 10000` when it started out as a u8? Perhaps my change in #13736 that
preserves min/max stats through cast expressions?
But we can see in the physical plan the DataFusion cast from `x` to `Int32`,
even though x is stored as an Int32 inside Parquet, is read back into an Int32
Arrow array, and down-casted to an Int8 arrow array, all before being returned
to DataFusion to be cast back up to Int32.
Admittedly, this could be solved by providing a "target type" in the
projection mask, short of full generic projection expression push-down. But it
remains interesting that many file formats have the ability to optimize some
subset of projection expressions. Even the Parquet reader could push-down
projection expressions over dictionary values prior to a full dictionary decode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]