[
https://issues.apache.org/jira/browse/HIVE-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ayush Saxena resolved HIVE-27360.
---------------------------------
Fix Version/s: 4.0.0-beta-1
Resolution: Fixed
> Iceberg: Don't create the redundant MANAGED location when creating table
> without EXTERNAL keyword
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-27360
> URL: https://issues.apache.org/jira/browse/HIVE-27360
> Project: Hive
> Issue Type: Improvement
> Components: Iceberg integration
> Reporter: zhangbutao
> Assignee: zhangbutao
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>
> If you create a managed iceberg table without specifying the location and the
> database has both location and managed_location, the final iceberg table
> location will be on database location instead of managed_location. But you
> can see a the database managed_location also has a iceberg table subdirectory
> which is always here even if the table was dropped.
> We should ensure the managed iceberg table always on database
> managed_location in case of database managed_location existing. The direct
> and simple way is we can use the created hms table location before
> committing iceberg table to avoid creating a new iceberg location.
>
> Step to repro:
> 1. set location and managed location properties:
>
> {code:java}
> set hive.metastore.warehouse.dir=/user/hive/warehouse/hiveicetest;
> set hive.metastore.warehouse.external.dir=
> /user/hive/warehouse/external/hiveicetest;
> set metastore.metadata.transformer.class=' '; //disable metastore
> transformer, this conf only can be set in metasetore server side{code}
> 2. create a database with default location and managed_location:
>
> {code:java}
> create database testdb;{code}
>
> {code:java}
> desc database testdb;{code}
>
> {code:java}
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | db_name | comment | location |
> managedlocation | owner_name | owner_type
> | connector_name | remote_dbname |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> | testdb | |
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db |
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db | hive | USER
> | |
> +----------+----------+----------------------------------------------------+----------------------------------------------------+-------------+-------------+-----------------+----------------+
> {code}
>
>
> 3. create a managed iceberg table without specifing the table location:
>
> {code:java}
> // the table location will on:
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01
> create table ice01 (id int) Stored by Iceberg stored as ORC;{code}
> but here you will find the two created location:
>
> {code:java}
> hdfs://ns/user/hive/warehouse/external/hiveicetest/testdb.db/ice01 //the
> actual location which is used by the managed iceberg table
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db/ice01 // a
> empty managed location which is unused
> {code}
>
> 4. drop the icebeg table
> you will find this unused managed location is still there:
> {code:java}
> hdfs://ns/user/hive/warehouse/hiveicetest/testdb.db{code}
>
>
> We should use the created managed location to avoid creating a new iceberg
> location.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)