[ 
https://issues.apache.org/jira/browse/HIVE-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-29587.
----------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Fix has been merged to master. Closing the jira. Thank you [~VenuReddy] for the 
patch.

> Managed table directory not deleted after DROP DATABASE when 
> hive.acid.lockless.reads.enabled=true
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29587
>                 URL: https://issues.apache.org/jira/browse/HIVE-29587
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Metastore, Standalone Metastore
>            Reporter: Venugopal Reddy K
>            Assignee: Venugopal Reddy K
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> *Description]*
> When {{hive.acid.lockless.reads.enabled}} is set to *true,* drop acid table 
> becomes a non-blocking operation in Hive Metastore (HMS).
> In this mode, Deletion of ACID table data is handled *asynchronously* by a 
> background cleaner thread. Table data and corresponding table directories are 
> eventually removed as expected after cleaner thread scheduled run.
> However, when drop acid table is followed by drop database cascade operation, 
> the *acid table directory itself is never deleted. Hence* {*}managed database 
> directory is also never deleted.{*}{*}{*}
> This results in 
>  # {*}orphaned table and database directories remaining indefinitely in the 
> filesystem{*}, leading to potential storage bloat and inconsistencies between 
> HMS metadata and filesystem state.
>  # Fails to recreate the database again with error -> Execution Error, return 
> code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. 
> MetaException(message:{*}Unable to create database managed directory{*} 
> [file:/tmp/warehouse/managed/mydb.db|file:///tmp/warehouse/managed/mydb.db], 
> failed to create database mydb);
> *[Steps to Reproduce]* 
> 1. Run the following commands:
> {code:java}
> 0: jdbc:hive2://localhost:10000> set hive.acid.lockless.reads.enabled=true;
> 0: jdbc:hive2://localhost:10000> create database mydb;
> 0: jdbc:hive2://localhost:10000> use mydb;
> 0: jdbc:hive2://localhost:10000> create table t1(i int);
> 0: jdbc:hive2://localhost:10000> create table t2(i int);
> 0: jdbc:hive2://localhost:10000> insert into t1 values(1),(2),(3);
> 0: jdbc:hive2://localhost:10000> insert into t2 values(1),(2),(3);
> # Make sure cleaner thead is not scheduled to run between below 2 operations.
> 0: jdbc:hive2://localhost:10000> drop table t2;
> 0: jdbc:hive2://localhost:10000> show compactions;
> +---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
> | compactionid  | dbname  | tabname  | partname  |  type  |        state      
>   | workerhost  | workerid  |  enqueuetime   | starttime  | duration  | 
> hadoopjobid  | errormessage  | initiatorhost  | initiatorid  | poolname  | 
> txnid  | nexttxnid  |   committime   | hightestwriteid  |
> +---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
> | 9             | mydb    | t2       |  ---      | MAJOR  | ready for 
> cleaning  |  ---        |  ---      | 1777458738637  |  ---       |  ---      
> | None         |  ---          |  ---           |  ---         | default   | 
> 32     | 32         | 1777458738667  |  ---             |
> +---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
> 0: jdbc:hive2://localhost:10000> drop database mydb cascade;
> 0: jdbc:hive2://localhost:10000> show compactions;
> +---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
> | compactionid  | dbname  | tabname  | partname  |  type  |        state      
>   | workerhost  | workerid  |  enqueuetime   | starttime  | duration  | 
> hadoopjobid  | errormessage  | initiatorhost  | initiatorid  | poolname  | 
> txnid  | nexttxnid  |   committime   | hightestwriteid  |
> +---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
> | 10            | mydb    | t1       |  ---      | MAJOR  | ready for 
> cleaning  |  ---        |  ---      | 1777458753447  |  ---       |  ---      
> | None         |  ---          |  ---           |  ---         | default   | 
> 33     | 33         | 1777458754019  |  ---             |
> +---------------+---------+----------+-----------+--------+---------------------+-------------+-----------+----------------+------------+-----------+--------------+---------------+----------------+--------------+-----------+--------+------------+----------------+------------------+
> {code}
> 2. Wait for cleaner thread to finish the next scheduled run and check the 
> filesystem for table directories. t1 directory is removed since it is part of 
> drop database cascade. But t2 directory remains forever. *Issue is happening 
> because drop database cascade has removed all the compaction queue entries 
> for the database that do not belong to current transaction. And cleaner 
> thread cannot detect the t2.*
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/jdbc/functions/CleanupRecordsFunction.java#L64-L69]
> *HMS query to remove compaction queue entries:*
> {code:java}
> DELETE FROM "COMPACTION_QUEUE" WHERE "CQ_DATABASE" = 'mydb' AND ("CQ_TABLE" = 
> null OR null IS NULL) AND ("CQ_PARTITION" = null OR null IS NULL) AND 
> ("CQ_TXN_ID" != 33 OR 33 IS NULL){code}
>  
> *Filesystem directory:*
> {code:java}
> user@Q044GWJL71 apache-hive-4.3.0-SNAPSHOT-bin %  ls -lt 
> /tmp/warehouse/managed/mydb.db
> total 0
> drwxr-xr-x  3 user  wheel  96 29 Apr 15:41 t2.v0000022
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to