[ 
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27360.
---------------------------------
    Fix Version/s: 4.0.0-beta-1
       Resolution: Fixed

> Iceberg: Don't create the redundant MANAGED location when creating table 
> without EXTERNAL keyword
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27360
>                 URL: https://issues.apache.org/jira/browse/HIVE-27360
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: zhangbutao
>            Assignee: zhangbutao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-beta-1
>
>
> If you create a managed iceberg table without specifying the location and the 
> database has both location and managed_location, the final iceberg table 
> location will be on database location instead of managed_location. But you 
> can see a the database managed_location also has a iceberg table subdirectory 
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database 
> managed_location in case of database managed_location existing. The direct 
> and  simple way is we can use the created hms table location before 
> committing iceberg table to avoid creating a new iceberg location.
>  
> Step to repro:
> 1. set location and managed location properties:
>  
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir= 
> /user/hive/warehouse/external/hiveicetest;
> set metastore.metadata.transformer.class=' ';  //disable metastore 
> transformer, this conf only can be set in metasetore server side{code}
> 2. create a database with default location and managed_location:
>  
> {code:java}
> create database testdb;{code}
>  
> {code:java}
> desc database testdb;{code}
>  
> {code:java}
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | db_name  | comment  |                      location                      |  
>                 managedlocation                   | owner_name  | owner_type  
> | connector_name  | remote_dbname  |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | testdb   |          | 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db | 
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive        | USER      
>   |                   |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
>  {code}
>  
>  
> 3. create a managed iceberg table without specifing the table location:
>  
> {code:java}
> // the table location will on: 
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>  
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01   //the 
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01            // a 
> empty managed location which is unused
> {code}
>  
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>  
>  
> We should use the created managed location to avoid creating a new iceberg 
> location.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to