[ 
https://issues.apache.org/jira/browse/HIVE-27328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27328:
--------------------------------
    Description: 
The cache introduced in HIVE-22825 is not invalidated in TezAMs which can 
eventually lead to query failures if the same table is used in a scenario like 
below:
1. CREATE TABLE
2. INSERT OVERWRITE
3. SELECT
4. DROP TABLE
...

in this case if 2) wrote a file like 
year=2011/base_0000001/bucket_00000_{*}1{*} (task attempt = 1), and in the next 
iteration it wrote year=2011/base_0000001/bucket_00000_{*}0{*} (task attempt = 
0), then acid dirCache contains an invalid value within the configured time 
range *hive.txn.acid.dir.cache.duration*

see debug screenshot where I found the actual base bucket file in the cached 
DirInfo:  !Screenshot 2026-02-09 at 13.50.43.png|width=587,height=269!

This cache is stored in memory, and the HS2-side is taken care of by 
HIVE-26060, but for the TezAMs, we need further improvement to achieve the same.

  was:
The cache introduced in  HIVE-22825 is not invalidated in TezAMs which can 
eventually lead to query failures if the same table is used in a scenario like 
below:
1. CREATE TABLE
2. INSERT OVERWRITE
3. SELECT
4. DROP TABLE
...

in this case if 2) wrote a file like year=2011/base_0000001/bucket_00000_*1* 
(task attempt = 1), and in the next iteration it wrote 
year=2011/base_0000001/bucket_00000_*0* (task attempt = 0), then acid dirCache 
contains an invalid value within the configured time range 
*hive.txn.acid.dir.cache.duration*
 
see  !Screenshot 2026-02-09 at 13.50.43.png! 

This cache is stored in memory, and the HS2-side is taken care of by 
HIVE-26060, but for the TezAMs, we need further improvement to achieve the same.


> Acid dirCache is not invalidated in TezAMs while dropping table
> ---------------------------------------------------------------
>
>                 Key: HIVE-27328
>                 URL: https://issues.apache.org/jira/browse/HIVE-27328
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screenshot 2026-02-09 at 13.50.43.png
>
>
> The cache introduced in HIVE-22825 is not invalidated in TezAMs which can 
> eventually lead to query failures if the same table is used in a scenario 
> like below:
> 1. CREATE TABLE
> 2. INSERT OVERWRITE
> 3. SELECT
> 4. DROP TABLE
> ...
> in this case if 2) wrote a file like 
> year=2011/base_0000001/bucket_00000_{*}1{*} (task attempt = 1), and in the 
> next iteration it wrote year=2011/base_0000001/bucket_00000_{*}0{*} (task 
> attempt = 0), then acid dirCache contains an invalid value within the 
> configured time range *hive.txn.acid.dir.cache.duration*
> see debug screenshot where I found the actual base bucket file in the cached 
> DirInfo:  !Screenshot 2026-02-09 at 13.50.43.png|width=587,height=269!
> This cache is stored in memory, and the HS2-side is taken care of by 
> HIVE-26060, but for the TezAMs, we need further improvement to achieve the 
> same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to