alamb commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2741964039
For example, here is the plan from a recent query from https://github.com/apache/datafusion/issues/15177 (I actually had to trim it to fit in the 65k limit): ```sql > explain SELECT * from 'hits_partitioned' WHERE "WatchID" IN (SELECT "WatchID" FROM 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY "EventTime" LIMIT 10); +---------------+--------- | plan_type | plan | +---------------+---------- | logical_plan | LeftSemi Join: hits_partitioned.WatchID = __correlated_sq_1.WatchID | | | TableScan: ... | | | SubqueryAlias: __correlated_sq_1 | | | Projection: hits_partitioned.WatchID | | | Sort: hits_partitioned.EventTime ASC NULLS LAST, fetch=10 | | | Projection: hits_partitioned.WatchID, hits_partitioned.EventTime | | | Filter: CAST(hits_partitioned.URL AS Utf8View) LIKE Utf8View("%google%") | | | TableScan: hits_partitioned projection=[WatchID, EventTime, URL], partial_filters=[CAST(hits_partitioned.URL AS Utf8View) LIKE Utf8View("%google%")] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | HashJoinExec: mode=Partitioned, join_type=RightSemi, on=[(WatchID@0, WatchID@0)] | | | CoalesceBatchesExec: target_batch_size=8192 | | | RepartitionExec: partitioning=Hash([WatchID@0], 16), input_partitions=16 | | | RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1 | | | ProjectionExec: expr=[WatchID@0 as WatchID] | | | SortPreservingMergeExec: [EventTime@1 ASC NULLS LAST], fetch=10 | | | SortExec: TopK(fetch=10), expr=[EventTime@1 ASC NULLS LAST], preserve_partitioning=[true] | | | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: CAST(URL@2 AS Utf8View) LIKE %google%, projection=[WatchID@0, EventTime@1] | | | DataSourceExec: file_groups={16 groups: ... | | | CoalesceBatchesExec: target_batch_size=8192 | | | RepartitionExec: partitioning=Hash([WatchID@0], 16), input_partitions=16 | | | DataSourceExec: file_groups={16 groups: ...file_type=parquet | | | | +---------------+-- 2 row(s) fetched. Elapsed 0.072 seconds. ``` ```sql EXPLAIN FORMAT tree ... ``` I think the `tree` explain format for the same query is much easier to understand: ```sql > explain format tree SELECT * from 'hits_partitioned' WHERE "WatchID" IN (SELECT "WatchID" FROM 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY "EventTime" LIMIT 10); +---------------+------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ HashJoinExec │ | | | │ -------------------- │ | | | │ join_type: RightSemi │ | | | │ ├──────────────┐ | | | │ on: │ │ | | | │ (WatchID = WatchID) │ │ | | | └─────────────┬─────────────┘ │ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec ││ CoalesceBatchesExec │ | | | │ -------------------- ││ -------------------- │ | | | │ target_batch_size: ││ target_batch_size: │ | | | │ 8192 ││ 8192 │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ RepartitionExec ││ RepartitionExec │ | | | │ -------------------- ││ -------------------- │ | | | │ output_partition_count: ││ output_partition_count: │ | | | │ 16 ││ 16 │ | | | │ ││ │ | | | │ partitioning_scheme: ││ partitioning_scheme: │ | | | │ Hash([WatchID@0], 16) ││ Hash([WatchID@0], 16) │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ RepartitionExec ││ DataSourceExec │ | | | │ -------------------- ││ -------------------- │ | | | │ output_partition_count: ││ files: 115 │ | | | │ 1 ││ format: parquet │ | | | │ ││ │ | | | │ partitioning_scheme: ││ │ | | | │ RoundRobinBatch(16) ││ │ | | | └─────────────┬─────────────┘└───────────────────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ ProjectionExec │ | | | │ -------------------- │ | | | │ WatchID: WatchID │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ SortPreservingMergeExec │ | | | │ -------------------- │ | | | │ EventTime ASC NULLS │ | | | │ LASTlimit: │ | | | │ 10 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ SortExec │ | | | │ -------------------- │ | | | │ EventTime@1 ASC NULLS LAST│ | | | │ │ | | | │ limit: 10 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ FilterExec │ | | | │ -------------------- │ | | | │ predicate: │ | | | │ CAST(URL AS Utf8View) LIKE│ | | | │ %google% │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ files: 115 │ | | | │ format: parquet │ | | | │ │ | | | │ predicate: │ | | | │ CAST(URL AS Utf8View) LIKE│ | | | │ %google% │ | | | └───────────────────────────┘ | | | | +---------------+------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.083 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org