[
https://issues.apache.org/jira/browse/HIVE-27328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-27328:
--------------------------------
Description:
The cache introduced in HIVE-22825 is not invalidated in TezAMs which can
eventually lead to query failures if the same table is used in a scenario like
below:
1. CREATE TABLE
2. INSERT OVERWRITE
3. SELECT
4. DROP TABLE
...
in this case if 2) wrote a file like
year=2011/base_0000001/bucket_00000_{*}1{*} (task attempt = 1), and in the next
iteration it wrote year=2011/base_0000001/bucket_00000_{*}0{*} (task attempt =
0), then acid dirCache contains an invalid value within the configured time
range *hive.txn.acid.dir.cache.duration*
see debug screenshot where I found the actual base bucket file in the cached
DirInfo: !Screenshot 2026-02-09 at 13.50.43.png|width=587,height=269!
This cache is stored in memory, and the HS2-side is taken care of by
HIVE-26060, but for the TezAMs, we need further improvement to achieve the same.
was:
The cache introduced in HIVE-22825 is not invalidated in TezAMs which can
eventually lead to query failures if the same table is used in a scenario like
below:
1. CREATE TABLE
2. INSERT OVERWRITE
3. SELECT
4. DROP TABLE
...
in this case if 2) wrote a file like year=2011/base_0000001/bucket_00000_*1*
(task attempt = 1), and in the next iteration it wrote
year=2011/base_0000001/bucket_00000_*0* (task attempt = 0), then acid dirCache
contains an invalid value within the configured time range
*hive.txn.acid.dir.cache.duration*
see !Screenshot 2026-02-09 at 13.50.43.png!
This cache is stored in memory, and the HS2-side is taken care of by
HIVE-26060, but for the TezAMs, we need further improvement to achieve the same.
> Acid dirCache is not invalidated in TezAMs while dropping table
> ---------------------------------------------------------------
>
> Key: HIVE-27328
> URL: https://issues.apache.org/jira/browse/HIVE-27328
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Attachments: Screenshot 2026-02-09 at 13.50.43.png
>
>
> The cache introduced in HIVE-22825 is not invalidated in TezAMs which can
> eventually lead to query failures if the same table is used in a scenario
> like below:
> 1. CREATE TABLE
> 2. INSERT OVERWRITE
> 3. SELECT
> 4. DROP TABLE
> ...
> in this case if 2) wrote a file like
> year=2011/base_0000001/bucket_00000_{*}1{*} (task attempt = 1), and in the
> next iteration it wrote year=2011/base_0000001/bucket_00000_{*}0{*} (task
> attempt = 0), then acid dirCache contains an invalid value within the
> configured time range *hive.txn.acid.dir.cache.duration*
> see debug screenshot where I found the actual base bucket file in the cached
> DirInfo: !Screenshot 2026-02-09 at 13.50.43.png|width=587,height=269!
> This cache is stored in memory, and the HS2-side is taken care of by
> HIVE-26060, but for the TezAMs, we need further improvement to achieve the
> same.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)