[ 
https://issues.apache.org/jira/browse/IMPALA-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143076#comment-17143076
 ] 

Tim Armstrong commented on IMPALA-9877:
---------------------------------------

Thanks [~koke], such clear bug reports are valuable.

> uuid() generates multiple values per row when joined from a CTE
> ---------------------------------------------------------------
>
>                 Key: IMPALA-9877
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9877
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 2.12.0
>         Environment: impalad version 2.12.0-cdh5.16.1 RELEASE (build 
> 4a3775ef6781301af81b23bca45a9faeca5e761d)
>            Reporter: Jorge Bernal
>            Priority: Major
>
> I'm not sure if I'm misunderstanding the execution model here, but I would 
> expect that when I use uuid() on a CTE, it returns a different value for each 
> row of the CTE, and not one value per row of the final join.  
> It's hard to explain, so here's an example:
>  
> {code:java}
> drop table if exists events;
> create table events (uid int, ts int);
> insert into events values (1,1), (2,2), (1,3);
> +-----+----+
> | uid | ts |
> +-----+----+
> | 1   | 1  |
> | 2   | 2  |
> | 1   | 3  |
> +-----+----+{code}
>  
> If you then use uuid() on a CTE, and join it with something else, the 
> resulting values might have a different uuid for the same input row:
> {code:java}
> with sessions as (
>     select
>         uid,
>         uuid() session_id,
>         min(ts) session_start
>     from events
>     group by uid
> )
> select
>     session_id,
>     events.*
> from sessions join events on sessions.uid = events.uid;
> drop table if exists koke_events;
> +--------------------------------------+-----+----+
> | session_id                           | uid | ts |
> +--------------------------------------+-----+----+
> | 71d3671a-fa97-4c00-8d90-d1584ad7cf18 | 2   | 2  |
> | af564c03-2db3-44cb-bf5b-7748e231af72 | 1   | 3  |
> | d4bc2779-1bd9-4a5e-a5e2-dc28b00d2f34 | 1   | 1  |
> +--------------------------------------+-----+----+
> {code}
> I would expect that rows with the same {{uid}} would have the same 
> {{session_id}} in this result.
> When I run the same query in Hive, that's the behavior:
> {code:java}
> +---------------------------------------+-------------+------------+--+
> |              session_id               | events.uid  | events.ts  |
> +---------------------------------------+-------------+------------+--+
> | e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc  | 1           | 1          |
> | e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc  | 1           | 3          |
> | 72f8a922-738e-4151-ba93-3b5e50eb9140  | 2           | 2          |
> +---------------------------------------+-------------+------------+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to