[ 
https://issues.apache.org/jira/browse/IMPALA-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10135:
---------------------------------------
    Description: 
When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, it's rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
events.

  was:
When Impala generates INSERT EVENTs it doesn't add the newly inserted datafiles.

The problem is that Impala misuses Sets.difference(set1, set2). From the API 
doc at 
[https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]

"The returned set contains all elements that are contained by {{set1}} and not 
contained by {{set2}}. {{set2}} may also contain elements not present in 
{{set1}}; these are simply ignored."

So the name "difference" is a bit misleading, it's rather a subtraction between 
set1 and set2.

Unfortunately Impala passes the parameters in wrong order: 
Sets.difference(beforeInsert, afterInsert):

[https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]

So the result will be always empty.

There's another problem with INSERT OVERWRITEs, in that case we never fill the 
data files of the insert event.


> Insert events doesn't contain the inserted data files
> -----------------------------------------------------
>
>                 Key: IMPALA-10135
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10135
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> When Impala generates INSERT EVENTs it doesn't add the newly inserted 
> datafiles.
> The problem is that Impala misuses Sets.difference(set1, set2). From the API 
> doc at 
> [https://guava.dev/releases/28.2-jre/api/docs/com/google/common/collect/Sets.html#difference-java.util.Set-java.util.Set-]
> "The returned set contains all elements that are contained by {{set1}} and 
> not contained by {{set2}}. {{set2}} may also contain elements not present in 
> {{set1}}; these are simply ignored."
> So the name "difference" is a bit misleading, it's rather a subtraction 
> between set1 and set2.
> Unfortunately Impala passes the parameters in wrong order: 
> Sets.difference(beforeInsert, afterInsert):
> [https://github.com/apache/impala/blob/4cb3c3556e77ee24003383155ca5e1b70be4db6e/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4581]
> So the result will be always empty.
> There's another problem with INSERT OVERWRITEs, as it doesn't send any INSERT 
> events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to