Andy Stadtler created IMPALA-8265:
-------------------------------------
Summary: Reject INSERT/UPSERT queries with ORDER BY and no
OFFSET/LIMIT
Key: IMPALA-8265
URL: https://issues.apache.org/jira/browse/IMPALA-8265
Project: IMPALA
Issue Type: Improvement
Reporter: Andy Stadtler
Currently Impala doesn't honor a sort by without a limit or offset in a insert
... select operation. While we currently throw a warning it seems like this
query should be rejected with the same message. Especially now with the UPSERT
ability and Kudu its obvious logic to take a table of duplicate rows and UPSERT
INTO kudu.table SELECT col1, col2, col3 FROM duplicate_row_table SORT BY
timestamp_column ASC. Impala will happily take this query and write incorrect
data. The same query works fine as a SELECT only query and it's easy to see
where users would make the mistake of reusing it in an INSERT/UPSERT.
Rejecting the query with the warning message would make sure the user knew the
sort by would not be honored and make sure they added a limit or changed their
query logic.
*Sorting considerations:* Although you can specify an {{ORDER BY}} clause in an
{{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is ignored and the
results are not necessarily sorted. An {{INSERT ... SELECT}} operation
potentially creates many different data files, prepared on different data
nodes, and therefore the notion of the data being stored in sorted order is
impractical.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]