[
https://issues.apache.org/jira/browse/IMPALA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Stadtler updated IMPALA-8265:
----------------------------------
Description:
Currently Impala doesn't honor a sort by without a limit or offset in a insert
... select operation. While we currently throw a warning it seems like this
query should be rejected with the same message. Especially now with the UPSERT
ability and Kudu its obvious logic to take a table of duplicate rows and use
the following query.
{code:java}
UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table SORT BY
timestamp_column ASC;{code}
Impala will happily take this query and write incorrect data. The same query
works fine as a SELECT only query and it's easy to see where users would make
the mistake of reusing it in an INSERT/UPSERT.
Rejecting the query with the warning message would make sure the user knew the
ORDER BY would not be honored and make sure they added a limit or changed their
query logic.
{quote}*Sorting considerations:* Although you can specify an {{ORDER BY}}
clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is
ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}}
operation potentially creates many different data files, prepared on different
data nodes, and therefore the notion of the data being stored in sorted order
is impractical.
{quote}
was:
Currently Impala doesn't honor a sort by without a limit or offset in a insert
... select operation. While we currently throw a warning it seems like this
query should be rejected with the same message. Especially now with the UPSERT
ability and Kudu its obvious logic to take a table of duplicate rows and use
the following query.
{code:java}
UPSERT INTO kudu.table SELECT col1, col2, col3 FROM duplicate_row_table SORT BY
timestamp_column ASC;{code}
Impala will happily take this query and write incorrect data. The same query
works fine as a SELECT only query and it's easy to see where users would make
the mistake of reusing it in an INSERT/UPSERT.
Rejecting the query with the warning message would make sure the user knew the
ORDER BY would not be honored and make sure they added a limit or changed their
query logic.
{quote}*Sorting considerations:* Although you can specify an {{ORDER BY}}
clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is
ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}}
operation potentially creates many different data files, prepared on different
data nodes, and therefore the notion of the data being stored in sorted order
is impractical.
{quote}
> Reject INSERT/UPSERT queries with ORDER BY and no OFFSET/LIMIT
> ---------------------------------------------------------------
>
> Key: IMPALA-8265
> URL: https://issues.apache.org/jira/browse/IMPALA-8265
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Andy Stadtler
> Priority: Critical
>
> Currently Impala doesn't honor a sort by without a limit or offset in a
> insert ... select operation. While we currently throw a warning it seems like
> this query should be rejected with the same message. Especially now with the
> UPSERT ability and Kudu its obvious logic to take a table of duplicate rows
> and use the following query.
> {code:java}
> UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table SORT
> BY timestamp_column ASC;{code}
> Impala will happily take this query and write incorrect data. The same query
> works fine as a SELECT only query and it's easy to see where users would make
> the mistake of reusing it in an INSERT/UPSERT.
>
> Rejecting the query with the warning message would make sure the user knew
> the ORDER BY would not be honored and make sure they added a limit or changed
> their query logic.
>
> {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}}
> clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is
> ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}}
> operation potentially creates many different data files, prepared on
> different data nodes, and therefore the notion of the data being stored in
> sorted order is impractical.
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]