[ 
https://issues.apache.org/jira/browse/IMPALA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928685#comment-17928685
 ] 

ASF subversion and git services commented on IMPALA-13759:
----------------------------------------------------------

Commit c8c64fff3a1bbeb05a6869b91a945b2730f62083 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c8c64fff3 ]

IMPALA-13759: Fix Hive ACID INSERT OVERWRITE base detection

Base directory created by INSERT OVERWRITE / TRUNCATE should be
treated differently than bases created by compaction because
IOW/TRUNCATE bases must be accepted even if there is an earlier
open writeId. This scenario can easily occur if there is
a pending write to a single partition, as this doesn't block
an IOW/TRUNCATE to another partition, while the global
minOpenWrite affects whether the base is accepted.

This change updates Impala logic to consider these bases
valid similarly to Hive.

Note that differentiating IOW/TRUNCATE from compaction is
different than in Hive, as metadata files are not considered
in Impala (IMPALA-13769). This can only cause problems when
interacting with earlier Hive versions that did not use
visibilityTxnId in the base path. I don't consider this
to be a significant regression that should block the current
critical fix.

Testing:
- added regression EE/FE tests

Change-Id: I838eaf4f41bae148e558f64288a1370c0908efa4
Reviewed-on: http://gerrit.cloudera.org:8080/22499
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Hive ACID table base folder identification procedure is inconsistent with Hive
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-13759
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13759
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Peter Rozsa
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: ACID
>
> Impala's base folder identification uses a different approach to decide 
> whether a base folder is feasible for reading or not in the sense of open 
> writeIds. This could cause read inconsistencies with Hive, as Hive reads the 
> base folder even if there's an open writeId before a newer base writeId.
> Impala's validation: 
> [https://github.com/apache/impala/blob/b8f4034754b691a4790e502af214935486aa3ced/fe/src/main/java/org/apache/impala/util/AcidUtils.java#L261]
> Hive's validation: 
> [https://github.com/apache/hive/blob/0759352ddddc793c0e717c460f0e08eb3f14c1e9/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1774-L1797]
> PR that changed the behavior: 
> [https://github.com/apache/hive/commit/8ee3497f87f81fa84ee1023e891dc54087c2cd5e]
>  
> Also, it's worth mentioning whether the described situation is considered 
> valid in the first place from Hive's side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to