Peter Rozsa created IMPALA-13759:
------------------------------------
Summary: Hive ACID table base folder identification procedure is
inconsistent with Hive
Key: IMPALA-13759
URL: https://issues.apache.org/jira/browse/IMPALA-13759
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Peter Rozsa
Impala's base folder identification uses a different approach to decide whether
a base folder is feasible for reading or not in the sense of open writeIds.
This could cause read inconsistencies with Hive, as Hive reads the base folder
even if there's an open writeId before a newer base writeId.
Impala's validation:
[https://github.com/apache/impala/blob/b8f4034754b691a4790e502af214935486aa3ced/fe/src/main/java/org/apache/impala/util/AcidUtils.java#L261]
Hive's validation:
[https://github.com/apache/hive/blob/0759352ddddc793c0e717c460f0e08eb3f14c1e9/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1774-L1797]
PR that changed the behavior:
[https://github.com/apache/hive/commit/8ee3497f87f81fa84ee1023e891dc54087c2cd5e]
Also, it's worth mentioning whether the described situation is considered valid
in the first place from Hive's side.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)