alamb commented on issue #3249:
URL:
https://github.com/apache/arrow-datafusion/issues/3249#issuecomment-1236089257
I have a suggestion on how to implement this functionality, which is to
"inline" the view definition so it is visible to the existing query optimizers
rather than trying to do extra pushdown.
This might also be described as "replace view references with subqueries" --
in general I would expect the plan for a query that references to to be the
same as the equivalent query if it had the view definitions inlined as
subqueries.
For example
```sql
create table t as select * from (values (1), (2), (3)) as sq;
-- Create a view with a single predicate
create view v as select * from t where column1 != 1;
-- Run a query with a second predicate
explain select * from v where column1 != 2;
```
The actual plan is (note there are two Filter nodes 👎 ):
```sql
--
+---------------+---------------------------------------------------------------------+
-- | plan_type | plan
|
--
+---------------+---------------------------------------------------------------------+
-- | logical_plan | Projection: #v.column1
|
-- | | Filter: #v.column1 != Int64(2)
|
-- | | TableScan: v projection=[column1]
|
-- | physical_plan | ProjectionExec: expr=[column1@0 as column1]
|
-- | | CoalesceBatchesExec: target_batch_size=4096
|
-- | | FilterExec: column1@0 != 2
|
-- | | ProjectionExec: expr=[column1@0 as column1]
|
-- | | CoalesceBatchesExec: target_batch_size=4096
|
-- | | CoalesceBatchesExec: target_batch_size=4096
|
-- | | FilterExec: column1@0 != 1
|
-- | | CoalesceBatchesExec:
target_batch_size=4096 |
-- | | RepartitionExec:
partitioning=RoundRobinBatch(16) |
-- | | RepartitionExec:
partitioning=RoundRobinBatch(16) |
-- | | MemoryExec: partitions=1,
partition_sizes=[1] |
-- | |
|
--
+---------------+---------------------------------------------------------------------+
-- 2 rows in set. Query took 0.005 seconds.
```
I expect a plan with a single filter with a predicate of `column1 != 1 AND
column1 != 2`, which is exactly what happens when the view's definition is
inlined as a subquery:
```sql
-- ❯ explain select * from (select * from t where column1 != 1) as sq where
column1 != 2;
--
+---------------+---------------------------------------------------------------+
-- | plan_type | plan
|
--
+---------------+---------------------------------------------------------------+
-- | logical_plan | Projection: #sq.column1
|
-- | | Projection: #t.column1, alias=sq
|
-- | | Filter: #t.column1 != Int64(2) AND #t.column1 !=
Int64(1) |
-- | | TableScan: t projection=[column1]
|
-- | physical_plan | ProjectionExec: expr=[column1@0 as column1]
|
-- | | ProjectionExec: expr=[column1@0 as column1]
|
-- | | CoalesceBatchesExec: target_batch_size=4096
|
-- | | FilterExec: column1@0 != 2 AND column1@0 != 1
|
-- | | RepartitionExec:
partitioning=RoundRobinBatch(16) |
-- | | MemoryExec: partitions=1, partition_sizes=[1]
|
-- | |
|
--
+---------------+---------------------------------------------------------------+
-- 2 rows in set. Query took 0.006 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]