[ 
https://issues.apache.org/jira/browse/HIVE-20362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-20362:
---------------------------
    Description: 
{code}
COALESCE(current_page, FIRST_VALUE(current_page,TRUE) OVER ( PARTITION BY 
user_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING))
{code}

This is a spilling PTF because it needs an unseen row to generate the current 
row when executing this.

This can be inverted, so that the shuffle sorts it the other way, so the result 
can be generated from a previously seen row.

{code}
COALESCE(current_page, LAST_VALUE(current_page,TRUE) OVER ( PARTITION BY 
user_id ORDER BY timestamp desc ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT 
ROW))
{code}

This query pattern is bad in general, but seems to exist as a workaround for - 
HIVE-18145

  was:
{code}
COALESCE(current_page, FIRST_VALUE(current_page,TRUE) OVER ( PARTITION BY 
user_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING))
{code}

This is a spilling PTF because it needs an unseen row to generate the current 
row when executing this.

This can be inverted, so that the shuffle sorts it the other way, so the result 
can be generated from a previously seen row.

{code}
COALESCE(current_page, LAST_VALUE(current_page,TRUE) OVER ( PARTITION BY 
user_id ORDER BY timestamp asc ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT 
ROW))
{code}

This query pattern is bad in general, but seems to exist as a workaround for - 
HIVE-18145


> PTF: Invert sorting for CURRENT_ROW to UNBOUNDED FOLLOWING 
> -----------------------------------------------------------
>
>                 Key: HIVE-20362
>                 URL: https://issues.apache.org/jira/browse/HIVE-20362
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, PTF-Windowing
>            Reporter: Gopal V
>            Priority: Major
>
> {code}
> COALESCE(current_page, FIRST_VALUE(current_page,TRUE) OVER ( PARTITION BY 
> user_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING))
> {code}
> This is a spilling PTF because it needs an unseen row to generate the current 
> row when executing this.
> This can be inverted, so that the shuffle sorts it the other way, so the 
> result can be generated from a previously seen row.
> {code}
> COALESCE(current_page, LAST_VALUE(current_page,TRUE) OVER ( PARTITION BY 
> user_id ORDER BY timestamp desc ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT 
> ROW))
> {code}
> This query pattern is bad in general, but seems to exist as a workaround for 
> - HIVE-18145



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to