[ https://issues.apache.org/jira/browse/IMPALA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Stadtler updated IMPALA-8265: ---------------------------------- Description: Currently Impala doesn't honor a sort by without a limit or offset in a insert ... select operation. While we currently throw a warning it seems like this query should be rejected with the same message. Especially now with the UPSERT ability and Kudu its obvious logic to take a table of duplicate rows and use the following query. {code:java} UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table SORT BY timestamp_column ASC;{code} Impala will happily take this query and write incorrect data. The same query works fine as a SELECT only query and it's easy to see where users would make the mistake of reusing it in an INSERT/UPSERT. Rejecting the query with the warning message would make sure the user knew the ORDER BY would not be honored and make sure they added a limit or changed their query logic. {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} operation potentially creates many different data files, prepared on different data nodes, and therefore the notion of the data being stored in sorted order is impractical. {quote} was: Currently Impala doesn't honor a sort by without a limit or offset in a insert ... select operation. While we currently throw a warning it seems like this query should be rejected with the same message. Especially now with the UPSERT ability and Kudu its obvious logic to take a table of duplicate rows and use the following query. {code:java} UPSERT INTO kudu.table SELECT col1, col2, col3 FROM duplicate_row_table SORT BY timestamp_column ASC;{code} Impala will happily take this query and write incorrect data. The same query works fine as a SELECT only query and it's easy to see where users would make the mistake of reusing it in an INSERT/UPSERT. Rejecting the query with the warning message would make sure the user knew the ORDER BY would not be honored and make sure they added a limit or changed their query logic. {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} operation potentially creates many different data files, prepared on different data nodes, and therefore the notion of the data being stored in sorted order is impractical. {quote} > Reject INSERT/UPSERT queries with ORDER BY and no OFFSET/LIMIT > --------------------------------------------------------------- > > Key: IMPALA-8265 > URL: https://issues.apache.org/jira/browse/IMPALA-8265 > Project: IMPALA > Issue Type: Improvement > Reporter: Andy Stadtler > Priority: Critical > > Currently Impala doesn't honor a sort by without a limit or offset in a > insert ... select operation. While we currently throw a warning it seems like > this query should be rejected with the same message. Especially now with the > UPSERT ability and Kudu its obvious logic to take a table of duplicate rows > and use the following query. > {code:java} > UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table SORT > BY timestamp_column ASC;{code} > Impala will happily take this query and write incorrect data. The same query > works fine as a SELECT only query and it's easy to see where users would make > the mistake of reusing it in an INSERT/UPSERT. > > Rejecting the query with the warning message would make sure the user knew > the ORDER BY would not be honored and make sure they added a limit or changed > their query logic. > > {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} > clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is > ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} > operation potentially creates many different data files, prepared on > different data nodes, and therefore the notion of the data being stored in > sorted order is impractical. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org