[ 
https://issues.apache.org/jira/browse/IMPALA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781764#comment-16781764
 ] 

Andy Stadtler commented on IMPALA-8265:
---------------------------------------

[~tarmstrong] yea I meant ORDER BY I fixed the example in the description. Do 
you think it would maybe make more sense to only do hard failure on UPSERT and 
not on INSERT? Would you be any more open to that? I think the INSERT uses for 
ORDER BY are probably not harmful to data integrity.

> Reject INSERT/UPSERT  queries with ORDER BY and no OFFSET/LIMIT
> ---------------------------------------------------------------
>
>                 Key: IMPALA-8265
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8265
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Andy Stadtler
>            Priority: Critical
>
> Currently Impala doesn't honor a sort by without a limit or offset in a 
> insert ... select operation. While Impala currently throws a warning it seems 
> like this query should be rejected with the same message. Especially now with 
> the UPSERT ability and Kudu its obvious logic to take a table of duplicate 
> rows and use the following query.
> {code:java}
> UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table ORDER 
> BY timestamp_column ASC;{code}
> Impala will happily take this query and write incorrect data. The same query 
> works fine as a SELECT only query and it's easy to see where users would make 
> the mistake of reusing it in an INSERT/UPSERT.
>  
> Rejecting the query with the warning message would make sure the user knew 
> the ORDER BY would not be honored and make sure they added a limit, changed 
> their query logic or removed the order by.
>  
> {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} 
> clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is 
> ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} 
> operation potentially creates many different data files, prepared on 
> different data nodes, and therefore the notion of the data being stored in 
> sorted order is impractical.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to