phillipleblanc commented on PR #11989:
URL: https://github.com/apache/datafusion/pull/11989#issuecomment-2360601240
@jayzhan211 I'm in the process of upgrading spiceai to use DataFusion 42 and
I'm running into the schema mismatch error from this PR:
`Internal("Physical input schema should be the same as the one converted
from logical input schema.")`
I have a custom TableProvider and I get the error when running a `SELECT
COUNT(1) FROM my_table`. This is what the explain plan looked like on DF 41:
```rust
let expected_plan = [
"+---------------+--------------------------------------------------------------------------------+",
"| plan_type | plan
|",
"+---------------+--------------------------------------------------------------------------------+",
"| logical_plan | Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
|",
"| | BytesProcessedNode
|",
"| | TableScan: non_federated_abc projection=[]
|",
"| physical_plan | AggregateExec: mode=Final, gby=[],
aggr=[count(Int64(1))] |",
"| | CoalescePartitionsExec
|",
"| | AggregateExec: mode=Partial, gby=[],
aggr=[count(Int64(1))] |",
"| | BytesProcessedExec
|",
"| | SchemaCastScanExec
|",
"| | RepartitionExec:
partitioning=RoundRobinBatch(3), input_partitions=1 |",
"| | SqlExec sql=SELECT \"id\",
\"created_at\" FROM non_federated_abc |",
"| |
|",
"+---------------+--------------------------------------------------------------------------------+",
];
```
(`BytesProcessedNode` and `BytesProcessedExec` are custom operators that we
inject for tracking the number of bytes processed, I don't think its relevant
to this bug - but I initially had a similar schema check for it that I ended up
removing for the reason below)
My assumption of what is going on here is that logically no columns are
required for the logical plan to come up with the count of the number of rows,
but the TableProvider has to return all of the columns because it needs the
rows to perform the count aggregation. But it ends up throwing away the columns
because they get erased in the aggregation. Thus the check that the physical
schema and the logical schema are equal is not strictly needed for this plan.
Does that sound right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]