[ 
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-20198:
------------------------------------------

    Assignee: Vihang Karajgaonkar

> Constant time table drops/renames
> ---------------------------------
>
>                 Key: HIVE-20198
>                 URL: https://issues.apache.org/jira/browse/HIVE-20198
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 4.0.0
>            Reporter: Alexander Kolbasov
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the 
> number of partitions). When a managed table is deleted, the implementation 
> deletes table metadata and then deletes all partitions in HDFS. HDFS 
> operations are optimized and only do a sequential deletes for partitions 
> outside of table prefix. This operation is O(P)where Pis the number of 
> partitions. 
> Table rename goes through the list of partitions and modifies table name (and 
> potentially db name) in each partition. It also modifies each partition 
> location to match the new db/table name and renames directories (which is a 
> non-atomic and slow operation on S3). This is O(P) operation where P is the 
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to 
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked 
> as "deleted". Subsequent public metadata APIs should skip such tables. A 
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse 
> existing locations. This change isn't compatible with the current behavior 
> where there is an assumption that table location is based on table name. We 
> can get around this by providing "opt-in" mechanism - special table property 
> that tells that the table can have such new behavior, so the improvement will 
> initially work for new tables created with this feature enabled. We may later 
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS 
> operations should be performed using client UGI rather then server's, so the 
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation 
> is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to