[
https://issues.apache.org/jira/browse/IMPALA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930233#comment-17930233
]
ASF subversion and git services commented on IMPALA-13759:
----------------------------------------------------------
Commit e5b785cd310116a973746e26b1181830f58fc93c in impala's branch
refs/heads/branch-4.5.0 from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5b785cd3 ]
IMPALA-13759: Fix Hive ACID INSERT OVERWRITE base detection
Base directory created by INSERT OVERWRITE / TRUNCATE should be
treated differently than bases created by compaction because
IOW/TRUNCATE bases must be accepted even if there is an earlier
open writeId. This scenario can easily occur if there is
a pending write to a single partition, as this doesn't block
an IOW/TRUNCATE to another partition, while the global
minOpenWrite affects whether the base is accepted.
This change updates Impala logic to consider these bases
valid similarly to Hive.
Note that differentiating IOW/TRUNCATE from compaction is
different than in Hive, as metadata files are not considered
in Impala (IMPALA-13769). This can only cause problems when
interacting with earlier Hive versions that did not use
visibilityTxnId in the base path. I don't consider this
to be a significant regression that should block the current
critical fix.
Testing:
- added regression EE/FE tests
Change-Id: I838eaf4f41bae148e558f64288a1370c0908efa4
Reviewed-on: http://gerrit.cloudera.org:8080/22499
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Hive ACID table base folder identification procedure is inconsistent with Hive
> ------------------------------------------------------------------------------
>
> Key: IMPALA-13759
> URL: https://issues.apache.org/jira/browse/IMPALA-13759
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Peter Rozsa
> Assignee: Csaba Ringhofer
> Priority: Critical
> Labels: ACID
> Fix For: Impala 4.6.0
>
>
> Impala's base folder identification uses a different approach to decide
> whether a base folder is feasible for reading or not in the sense of open
> writeIds. This could cause read inconsistencies with Hive, as Hive reads the
> base folder even if there's an open writeId before a newer base writeId.
> Impala's validation:
> [https://github.com/apache/impala/blob/b8f4034754b691a4790e502af214935486aa3ced/fe/src/main/java/org/apache/impala/util/AcidUtils.java#L261]
> Hive's validation:
> [https://github.com/apache/hive/blob/0759352ddddc793c0e717c460f0e08eb3f14c1e9/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1774-L1797]
> PR that changed the behavior:
> [https://github.com/apache/hive/commit/8ee3497f87f81fa84ee1023e891dc54087c2cd5e]
>
> Also, it's worth mentioning whether the described situation is considered
> valid in the first place from Hive's side.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]