[
https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474673#comment-16474673
]
Thomas Tauber-Marshall commented on IMPALA-7015:
------------------------------------------------
So I agree with the point from IMPALA-3710 that we don't want these types of
errors to prematurely terminate the query, which would happen if we just
returned them immediately. I think that it would be reasonable to just save the
errors and return an error status once all of the rows have been sent to Kudu,
eg in FlushFinal().
We could then return an error message with the counts of rows that hit various
errors. That wouldn't be as good as the more structured approach suggested in
IMPALA-4416 and IMPALA-1789, since clients would have to parse the message, but
it would at least be something, rather than just returning no info about what
happened.
My main hesitation is that this would be a somewhat breaking change, and if
we're going to do that it might be better to wait until everything is in place
to do it the right way, rather than doing breaking changes around this twice.
> Insert into Kudu table returns with Status OK even if there are Kudu errors
> ---------------------------------------------------------------------------
>
> Key: IMPALA-7015
> URL: https://issues.apache.org/jira/browse/IMPALA-7015
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.12.0
> Reporter: Mostafa Mokhtar
> Priority: Major
> Attachments: Insert into kudu profile with errors.txt
>
>
> DML statements against Kudu tables return status OK even if there are Kudu
> errors.
> This behavior is misleading.
> {code}
> Summary:
> Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4
> Session Type: BEESWAX
> Start Time: 2018-05-11 10:10:07.314218000
> End Time: 2018-05-11 10:10:07.434017000
> Query Type: DML
> Query State: FINISHED
> Query Status: OK
> Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build
> 2f9498d5c2f980aa7ff9505c56654c8e59e026ca)
> User: mmokhtar
> Connected User: mmokhtar
> Delegated User:
> Network Address: ::ffff:10.17.234.27:60760
> Default Db: tpcds_1000_kudu
> Sql Statement: insert into store_2 select * from store
> Coordinator: vd1317.foo:22000
> Query Options (set by configuration):
> Query Options (set by configuration and planner): MT_DOP=0
> Plan:
> {code}
> {code}
> Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem
> Est. Peak Mem Detail
> -------------------------------------------------------------------------------------------------------------------------------------------------
> 02:PARTIAL SORT 5 909.030us 1.025ms 1.00K 1.00K 6.14 MB
> 4.00 MB
> 01:EXCHANGE 5 6.262ms 7.232ms 1.00K 1.00K 75.50 KB
> 0 KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk))
> 00:SCAN KUDU 5 3.694ms 4.137ms 1.00K 1.00K 4.34 MB
> 0 tpcds_1000_kudu.store
> Errors: Key already present in Kudu table
> 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]