alamb commented on issue #9011:
URL:
https://github.com/apache/arrow-datafusion/issues/9011#issuecomment-1912686214
The original plan you show has a `TableScan` at the top -- is this a
projection? Or is it a view definition somehow?
My reading of the plan
```
TableScan: ?table? projection=[project_id, user_id, created_at, event_id,
event, str_0] <---- this says it needs all columns
PartitionedAggregate: ...
Filter: project_id = Int64(1) AND created_at >=
TimestampNanosecond(1705419428144118000, None) AND created_at <=
TimestampNanosecond(1706283428144118000, None) AND event = UInt16(13)
Sort: project_id ASC NULLS LAST, user_id ASC NULLS LAST
Repartition: Hash(project_id, user_id) partition_count=12
Projection: project_id, user_id, created_at, event
TableScan: ?table? projection=[project_id, user_id, created_at,
event_id, event, str_0]]
```
As for the repartitioning i think that is happening to try and run the
filter in parallel It is somewhat messy but not obviously wrong to me
The optimizer is going to try and satisfy the requirements stated by the
topmost `ExecutionPlan` in your tree. Thus I think figuring out what the top
most plan node is and what it is requesting of its input is the best way to
understand what DataFusion is doing in this case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]