[
https://issues.apache.org/jira/browse/HIVE-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548628#comment-16548628
]
Eugene Koifman commented on HIVE-20198:
---------------------------------------
could TBLS.TBL_ID be used as this ID?
Not strictly related, but it would be nice if Table object contained this
TBL_ID as well.
> Constant time table drops/renames
> ---------------------------------
>
> Key: HIVE-20198
> URL: https://issues.apache.org/jira/browse/HIVE-20198
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 4.0.0
> Reporter: Alexander Kolbasov
> Priority: Major
>
> Currently table drops and table renames have O(P) performance (where P is the
> number of partitions). When a managed table is deleted, the implementation
> deletes table metadata and then deletes all partitions in HDFS. HDFS
> operations are optimized and only do a sequential deletes for partitions
> outside of table prefix. This operation is O(P)where Pis the number of
> partitions.
> Table rename goes through the list of partitions and modifies table name (and
> potentially db name) in each partition. It also modifies each partition
> location to match the new db/table name and renames directories (which is a
> non-atomic and slow operation on S3). This is O(P) operation where P is the
> number of partitions.
> Basic idea is to do the following:
> # Assign unique ID to each table
> # Create directory name based on unique ID rather then the name
> # Table rename then becomes metadata-only operation - there is no need to
> change any location information.
> # Table drop can become an asynchronous operation where the table is marked
> as "deleted". Subsequent public metadata APIs should skip such tables. A
> background cleaner thread may then go and clean up directories.
> Since the table location is unique for each table, new tables will not reuse
> existing locations. This change isn't compatible with the current behavior
> where there is an assumption that table location is based on table name. We
> can get around this by providing "opt-in" mechanism - special table property
> that tells that the table can have such new behavior, so the improvement will
> initially work for new tables created with this feature enabled. We may later
> provide some tool to convert existing tables to the new scheme.
> One complication is there in case where impersonation is enabled - the FS
> operations should be performed using client UGI rather then server's, so the
> cleaner thread should be able to use client UGIs.
> Initially we can punt on this and do standard table drops when impersonation
> is enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)