[
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189685#comment-17189685
]
Zoltán Borók-Nagy commented on IMPALA-10135:
--------------------------------------------
Thanks for taking care of this, Vihang! Please note that the problem with
INSERT OVERWRITEs is not that we don't provide the files, but that we don't
even send events at all, because 'partsPostInsert' is always empty for INSERT
OVERWRITEs, therefore 'insertEventInfos' also remains empty:
[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4543-L4553]
> Insert events doesn't contain the inserted data files
> -----------------------------------------------------
>
> Key: IMPALA-10135
> URL: https://issues.apache.org/jira/browse/IMPALA-10135
> Project: IMPALA
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Assignee: Vihang Karajgaonkar
> Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API
> doc at
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and
> not contained by {{set2}}. {{set2}} may also contain elements not present in
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order:
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT
> events.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]