[ 
https://issues.apache.org/jira/browse/IMPALA-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10685:
---------------------------------------
    Description: 
We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-----+------+-------+
| key | key  | value |
+-----+------+-------+
| k1  | k1   | -1    |
| k1  | k1   | 1     |
| k2  | k2   | 100   |
| k1  | k1   | 2     |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-----+------+-------+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-----+-----+-------+
| key | key | value |
+-----+-----+-------+
| k1  | k1  | -1    |
| k1  | k1  | 1     |
| k2  | k2  | 100   |
| k1  | k1  | 2     |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-----+-----+-------+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.

Possible workaround is to rewrite the query to use an inline view:

{noformat}
with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
)
select * from v
left join
(select int_map.* from
 functional_orc_def.complextypestbl c, c.int_map) vv
using (key);
{noformat}


  was:
We are rewriting "A join B" to "A join B1 join B2" for some queries that refer 
to collections in ACID tables. This is ok for inner join but may be incorrect 
for outer joins. Here is an example, the two queries produce different results:

Query works well for non-ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_parquet.complextypestbl.int_map using 
(key);
+-----+------+-------+
| key | key  | value |
+-----+------+-------+
| k1  | k1   | -1    |
| k1  | k1   | 1     |
| k2  | k2   | 100   |
| k1  | k1   | 2     |
| k2  | k2   | NULL  |
| k1  | k1   | NULL  |
| k3  | k3   | NULL  |
| k4  | NULL | NULL  |
+-----+------+-------+
Fetched 8 row(s) in 3.35s
{noformat}
LEFT OUTER JOIN converted to INNER JOIN for ACID table:
{noformat}
impala> with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
) select * from v left join functional_orc_def.complextypestbl.int_map using 
(key);
+-----+-----+-------+
| key | key | value |
+-----+-----+-------+
| k1  | k1  | -1    |
| k1  | k1  | 1     |
| k2  | k2  | 100   |
| k1  | k1  | 2     |
| k2  | k2  | NULL  |
| k1  | k1  | NULL  |
| k3  | k3  | NULL  |
+-----+-----+-------+
Fetched 7 row(s) in 0.35s
{noformat}
 IMPALA-9494 can help to fix this. Until that we could use the techniques from 
IMPALA-9330.

Possible workaround is to rewrite the query to:

{noformat}
with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
)
select * from v
left join
(select int_map.* from
 functional_orc_def.complextypestbl c, c.int_map) vv
using (key);
{noformat}



> OUTER JOIN against ACID collections might be converted to INNER JOIN
> --------------------------------------------------------------------
>
>                 Key: IMPALA-10685
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10685
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> We are rewriting "A join B" to "A join B1 join B2" for some queries that 
> refer to collections in ACID tables. This is ok for inner join but may be 
> incorrect for outer joins. Here is an example, the two queries produce 
> different results:
> Query works well for non-ACID table:
> {noformat}
> impala> with v as (
>   select ('k4') as key
>   union all
>   values ('k1'), ('k2'), ('k3')
> ) select * from v left join functional_parquet.complextypestbl.int_map using 
> (key);
> +-----+------+-------+
> | key | key  | value |
> +-----+------+-------+
> | k1  | k1   | -1    |
> | k1  | k1   | 1     |
> | k2  | k2   | 100   |
> | k1  | k1   | 2     |
> | k2  | k2   | NULL  |
> | k1  | k1   | NULL  |
> | k3  | k3   | NULL  |
> | k4  | NULL | NULL  |
> +-----+------+-------+
> Fetched 8 row(s) in 3.35s
> {noformat}
> LEFT OUTER JOIN converted to INNER JOIN for ACID table:
> {noformat}
> impala> with v as (
>   select ('k4') as key
>   union all
>   values ('k1'), ('k2'), ('k3')
> ) select * from v left join functional_orc_def.complextypestbl.int_map using 
> (key);
> +-----+-----+-------+
> | key | key | value |
> +-----+-----+-------+
> | k1  | k1  | -1    |
> | k1  | k1  | 1     |
> | k2  | k2  | 100   |
> | k1  | k1  | 2     |
> | k2  | k2  | NULL  |
> | k1  | k1  | NULL  |
> | k3  | k3  | NULL  |
> +-----+-----+-------+
> Fetched 7 row(s) in 0.35s
> {noformat}
>  IMPALA-9494 can help to fix this. Until that we could use the techniques 
> from IMPALA-9330.
> Possible workaround is to rewrite the query to use an inline view:
> {noformat}
> with v as (
>   select ('k4') as key
>   union all
>   values ('k1'), ('k2'), ('k3')
> )
> select * from v
> left join
> (select int_map.* from
>  functional_orc_def.complextypestbl c, c.int_map) vv
> using (key);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to