[ 
https://issues.apache.org/jira/browse/IMPALA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Stadtler updated IMPALA-8265:
----------------------------------
    Description: 
Currently Impala doesn't honor a order by without a limit or offset in a insert 
... select operation. While Impala currently throws a warning it seems like 
this query should be rejected with the same message. Especially now with the 
UPSERT ability and Kudu its obvious logic to take a table of duplicate rows and 
use the following query.
{code:java}
UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table ORDER 
BY timestamp_column ASC;{code}
Impala will happily take this query and write incorrect data. The same query 
works fine as a SELECT only query and it's easy to see where users would make 
the mistake of reusing it in an INSERT/UPSERT.

 

Rejecting the query with the warning message would make sure the user knew the 
ORDER BY would not be honored and make sure they added a limit, changed their 
query logic or removed the order by.

 
{quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} 
clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is 
ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} 
operation potentially creates many different data files, prepared on different 
data nodes, and therefore the notion of the data being stored in sorted order 
is impractical.
{quote}

  was:
Currently Impala doesn't honor a sort by without a limit or offset in a insert 
... select operation. While Impala currently throws a warning it seems like 
this query should be rejected with the same message. Especially now with the 
UPSERT ability and Kudu its obvious logic to take a table of duplicate rows and 
use the following query.
{code:java}
UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table ORDER 
BY timestamp_column ASC;{code}
Impala will happily take this query and write incorrect data. The same query 
works fine as a SELECT only query and it's easy to see where users would make 
the mistake of reusing it in an INSERT/UPSERT.

 

Rejecting the query with the warning message would make sure the user knew the 
ORDER BY would not be honored and make sure they added a limit, changed their 
query logic or removed the order by.

 
{quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} 
clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is 
ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} 
operation potentially creates many different data files, prepared on different 
data nodes, and therefore the notion of the data being stored in sorted order 
is impractical.
{quote}


> Reject INSERT/UPSERT  queries with ORDER BY and no OFFSET/LIMIT
> ---------------------------------------------------------------
>
>                 Key: IMPALA-8265
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8265
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Andy Stadtler
>            Priority: Critical
>
> Currently Impala doesn't honor a order by without a limit or offset in a 
> insert ... select operation. While Impala currently throws a warning it seems 
> like this query should be rejected with the same message. Especially now with 
> the UPSERT ability and Kudu its obvious logic to take a table of duplicate 
> rows and use the following query.
> {code:java}
> UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table ORDER 
> BY timestamp_column ASC;{code}
> Impala will happily take this query and write incorrect data. The same query 
> works fine as a SELECT only query and it's easy to see where users would make 
> the mistake of reusing it in an INSERT/UPSERT.
>  
> Rejecting the query with the warning message would make sure the user knew 
> the ORDER BY would not be honored and make sure they added a limit, changed 
> their query logic or removed the order by.
>  
> {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} 
> clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is 
> ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} 
> operation potentially creates many different data files, prepared on 
> different data nodes, and therefore the notion of the data being stored in 
> sorted order is impractical.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to