[ 
https://issues.apache.org/jira/browse/IMPALA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929200#comment-17929200
 ] 

ASF subversion and git services commented on IMPALA-13759:
----------------------------------------------------------

Commit 72044cbaa71109a1a261b7e95ede8a70afcc0a7a in impala's branch 
refs/heads/branch-4.5.0 from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=72044cbaa ]

IMPALA-13759: Fix Hive ACID INSERT OVERWRITE base detection

Base directory created by INSERT OVERWRITE / TRUNCATE should be
treated differently than bases created by compaction because
IOW/TRUNCATE bases must be accepted even if there is an earlier
open writeId. This scenario can easily occur if there is
a pending write to a single partition, as this doesn't block
an IOW/TRUNCATE to another partition, while the global
minOpenWrite affects whether the base is accepted.

This change updates Impala logic to consider these bases
valid similarly to Hive.

Note that differentiating IOW/TRUNCATE from compaction is
different than in Hive, as metadata files are not considered
in Impala (IMPALA-13769). This can only cause problems when
interacting with earlier Hive versions that did not use
visibilityTxnId in the base path. I don't consider this
to be a significant regression that should block the current
critical fix.

Testing:
- added regression EE/FE tests

Change-Id: I838eaf4f41bae148e558f64288a1370c0908efa4
Reviewed-on: http://gerrit.cloudera.org:8080/22499
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Hive ACID table base folder identification procedure is inconsistent with Hive
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-13759
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13759
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Peter Rozsa
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: ACID
>             Fix For: Impala 4.6.0
>
>
> Impala's base folder identification uses a different approach to decide 
> whether a base folder is feasible for reading or not in the sense of open 
> writeIds. This could cause read inconsistencies with Hive, as Hive reads the 
> base folder even if there's an open writeId before a newer base writeId.
> Impala's validation: 
> [https://github.com/apache/impala/blob/b8f4034754b691a4790e502af214935486aa3ced/fe/src/main/java/org/apache/impala/util/AcidUtils.java#L261]
> Hive's validation: 
> [https://github.com/apache/hive/blob/0759352ddddc793c0e717c460f0e08eb3f14c1e9/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1774-L1797]
> PR that changed the behavior: 
> [https://github.com/apache/hive/commit/8ee3497f87f81fa84ee1023e891dc54087c2cd5e]
>  
> Also, it's worth mentioning whether the described situation is considered 
> valid in the first place from Hive's side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to