[ 
https://issues.apache.org/jira/browse/IMPALA-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456364#comment-17456364
 ] 

Antoni Ivanov commented on IMPALA-11014:
----------------------------------------

Hi, 

[~csringhofer] . I was looking at the impala code: 

It seems the issue is coming from somewhere here: 

* ClientRequestState::WaitInternal : 
https://github.com/cloudera/Impala/blob/cdh6.3.0/be/src/service/client-request-state.cc#L808
 
** Coordinator::Wait : 
https://github.com/cloudera/Impala/blob/cdh6.3.0/be/src/runtime/coordinator.cc#L621
** FinalizeHdfsInsert (if this fails, we may get partial inserts) : 
https://github.com/cloudera/Impala/blob/cdh6.3.0/be/src/runtime/coordinator.cc#L577
* ClientRequestState::UpdateCatalog (if this fails, we may get non-atomic 
write) : 
https://github.com/cloudera/Impala/blob/cdh6.3.0/be/src/service/client-request-state.cc#L1077


Does that seem correct? 
Also, can we use query profile to detect the issue. I can see "DML Stats" is 
set after the FinalizeHdfsInsert . Would that mean that if query fails but it 
has "DML Stats" it has written data 


> Data is being inserted even though an INSERT INTO query fails
> -------------------------------------------------------------
>
>                 Key: IMPALA-11014
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11014
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Tsvetomir Palashki
>            Priority: Major
>         Attachments: profile.txt
>
>
> We are executing an INSERT INTO query against Impala. In rare cases this 
> query fails with the following error:
> {code:java}
> MetaException: Object with id "" is managed by a different persistence 
> manager {code}
> Even though there is an error, the data is inserted into the table. This is 
> particularly problematic due to our error handling logic, which refreshes the 
> table metadata and retries the query, which causes data duplication.
> I am aware that this bug might be fixed in one of the newer Impala versions, 
> but at this point, we are unable to upgrade.
> Can you suggest a workaround for this? Is it safe to assume that the data is 
> always inserted when this particular error happens? Can we rely on the 
> rows_inserted and rows_produced fields of the query in order to make 
> assumptions about what data is inserted?
> The exact version of our Impala is:
> {code:java}
> impalad version 3.2.0-cdh6.3.2 RELEASE (build 
> 1bb9836227301b839a32c6bc230e35439d5984ac) Built on Fri Nov 8 07:22:06 PST 
> 2019 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to