[
https://issues.apache.org/jira/browse/IMPALA-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218538#comment-17218538
]
Tim Armstrong commented on IMPALA-10252:
----------------------------------------
Yeah the runtime filter in this plan shouldn't be there.
{noformat}
[localhost.EXAMPLE.COM:21050] default> explain select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from
functional_parquet.alltypestiny t where s.
id = t.id)
> order by id;
Query: explain select id, int_col, year, month
from functional_parquet.alltypessmall s
where s.int_col = (select count(*) from functional_parquet.alltypestiny t
where s.id = t.id)
order by id
+------------------------------------------------------------------------------------+
| Explain String
|
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=71.96MB Threads=6
|
| Per-Host Resource Estimates: Memory=308MB
|
| WARNING: The following tables are missing relevant table and/or column
statistics. |
| functional_parquet.alltypessmall, functional_parquet.alltypestiny
|
|
|
| PLAN-ROOT SINK
|
| |
|
| 08:MERGING-EXCHANGE [UNPARTITIONED]
|
| | order by: id ASC
|
| |
|
| 04:SORT
|
| | order by: id ASC
|
| | row-size=16B cardinality=940
|
| |
|
| 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]
|
| | hash predicates: s.id = t.id
|
| | other predicates: s.int_col = zeroifnull(count(*))
|
| | runtime filters: RF000 <- zeroifnull(count(*))
|
| | row-size=28B cardinality=940
|
| |
|
| |--06:AGGREGATE [FINALIZE]
|
| | | output: count:merge(*)
|
| | | group by: t.id
|
| | | row-size=12B cardinality=758
|
| | |
|
| | 05:EXCHANGE [HASH(t.id)]
|
| | |
|
| | 02:AGGREGATE [STREAMING]
|
| | | output: count(*)
|
| | | group by: t.id
|
| | | row-size=12B cardinality=758
|
| | |
|
| | 01:SCAN HDFS [functional_parquet.alltypestiny t]
|
| | HDFS partitions=4/4 files=4 size=11.92KB
|
| | row-size=4B cardinality=758
|
| |
|
| 07:EXCHANGE [HASH(s.id)]
|
| |
|
| 00:SCAN HDFS [functional_parquet.alltypessmall s]
|
| HDFS partitions=4/4 files=4 size=14.78KB
|
| runtime filters: RF000 -> s.int_col
|
| row-size=16B cardinality=940
|
+------------------------------------------------------------------------------------+
{noformat}
The subquery gets rewritten to this:
{noformat}
SELECT id, int_col, `year`, `month`
FROM functional_parquet.alltypessmall s
LEFT OUTER JOIN (SELECT count(*) `$c$1`, t.id `$c$2` FROM
functional_parquet.alltypestiny t GROUP BY t.id) `$a$1` ON s.id = `$a$1`.`$c$2`
WHERE TRUE AND s.int_col = zeroifnull(`$a$1`.`$c$1`)
ORDER BY id ASC;
{noformat}
which has the same correctness issue
> Query returns less number of rows with run-time filtering on integer column
> in a subquery against functional_parquet schema
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-10252
> URL: https://issues.apache.org/jira/browse/IMPALA-10252
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Qifan Chen
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness
>
> During the work to address IMPALA-6628 (Use unqualified table references in
> .test files run from test_queries.py), it is found that a query against the
> functional_parquet database returns 1 row while the same query returns 12
> rows when run-time filtering is turned off, or against the functional
> database.
>
>
> {code:java}
> Query: --SET RUNTIME_FILTER_MODE=OFF;
> select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from functional_parquet.alltypestiny t
> where s.id = t.id)
> order by id
> Query submitted at: 2020-10-18 12:41:15 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=394a61d8f0002336:fd45e07300000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> +----+---------+------+-------+
> {code}
>
>
> {code:java}
> RUNTIME_FILTER_MODE set to OFF
> Query: select id, int_col, year, month
> from functional_parquet.alltypessmall s
> where s.int_col = (select count(*) from functional_parquet.alltypestiny t
> where s.id = t.id)
> order by id
> Query submitted at: 2020-10-18 12:40:58 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=304c095f478607fc:7d2d03ff00000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> | 10 | 0 | 2009 | 1 |
> | 20 | 0 | 2009 | 1 |
> | 25 | 0 | 2009 | 2 |
> | 35 | 0 | 2009 | 2 |
> | 45 | 0 | 2009 | 2 |
> | 50 | 0 | 2009 | 3 |
> | 60 | 0 | 2009 | 3 |
> | 70 | 0 | 2009 | 3 |
> | 75 | 0 | 2009 | 4 |
> | 85 | 0 | 2009 | 4 |
> | 95 | 0 | 2009 | 4 |
> +----+---------+------+-------+{code}
>
> Query against functional database.
> {code:java}
> Query: select id, int_col, year, month
> from functional.alltypessmall s
> where s.int_col = (select count(*) from functional.alltypestiny t where s.id
> = t.id)
> order by id
> Query submitted at: 2020-10-18 12:35:24 (Coordinator:
> http://qifan-10229:25000)
> Query progress can be monitored at:
> http://qifan-10229:25000/query_plan?query_id=104bd5d7a6d5fe74:09a6c09000000000
> +----+---------+------+-------+
> | id | int_col | year | month |
> +----+---------+------+-------+
> | 1 | 1 | 2009 | 1 |
> | 10 | 0 | 2009 | 1 |
> | 20 | 0 | 2009 | 1 |
> | 25 | 0 | 2009 | 2 |
> | 35 | 0 | 2009 | 2 |
> | 45 | 0 | 2009 | 2 |
> | 50 | 0 | 2009 | 3 |
> | 60 | 0 | 2009 | 3 |
> | 70 | 0 | 2009 | 3 |
> | 75 | 0 | 2009 | 4 |
> | 85 | 0 | 2009 | 4 |
> | 95 | 0 | 2009 | 4 |
> +----+---------+------+-------+{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]