liukun4515 opened a new issue, #7254:
URL: https://github.com/apache/arrow-datafusion/issues/7254
### Is your feature request related to a problem or challenge?
Now when i get a physical plan like that:
```
❯ explain select test1.id,test2.int_col from test1 join test2 on test1.id =
test2.bigint_col;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: test1.id, test2.int_col
|
| | Inner Join: CAST(test1.id AS Int64) = test2.bigint_col
|
| | TableScan: test1 projection=[id]
|
| | TableScan: test2 projection=[int_col, bigint_col]
|
| physical_plan | ProjectionExec: expr=[id@0 as id, int_col@1 as int_col]
|
| | ProjectionExec: expr=[id@0 as id, int_col@2 as int_col,
bigint_col@3 as bigint_col]
|
| | CoalesceBatchesExec: target_batch_size=8192
|
| | HashJoinExec: mode=CollectLeft, join_type=Inner,
on=[(CAST(test1.id AS Int64)@1, bigint_col@1)]
|
| | ProjectionExec: expr=[id@0 as id, CAST(id@0 AS
Int64) as CAST(test1.id AS Int64)]
|
| | ParquetExec: file_groups={1 group:
[[Users/kliu3/Documents/ebay/arrow-ballista/target/debug/alltypes_plain.parquet]]},
projection=[id] |
| | ParquetExec: file_groups={1 group:
[[Users/kliu3/Documents/ebay/arrow-ballista/target/debug/alltypes_plain.parquet]]},
projection=[int_col, bigint_col] |
| |
|
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.136 seconds.
```
But the physical plan missing the data of Statistics.
Can we add the `Statistics` in the physical plan format?
The struct of statistics is
```
pub struct Statistics {
/// The number of table rows
pub num_rows: Option<usize>,
/// total bytes of the table rows
pub total_byte_size: Option<usize>,
/// Statistics on a column level
pub column_statistics: Option<Vec<ColumnStatistics>>,
/// If true, any field that is `Some(..)` is the actual value in the
data provided by the operator (it is not
/// an estimate). Any or all other fields might still be None, in which
case no information is known.
/// if false, any field that is `Some(..)` may contain an inexact
estimate and may not be the actual value.
pub is_exact: bool,
}
```
we can just log the `num_rows`, `total_byte_size`, `is_exact` and ignore the
`column_statistics`
### Describe the solution you'd like
append the Statistics in the `DisplayAs` trait
cc @alamb @jackwener
### Describe alternatives you've considered
_No response_
### Additional context
this will change many test cases for the plan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]