[ 
https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21718:
---------------------------------
    Status: Patch Available  (was: Open)

> Improvement performance of UpdateInputAccessTimeHook
> ----------------------------------------------------
>
>                 Key: HIVE-21718
>                 URL: https://issues.apache.org/jira/browse/HIVE-21718
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 2.1.1
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Major
>         Attachments: HIVE-21718.2.patch, HIVE-21718.patch
>
>
> Currently, Hive does not update the lastAccessTime property for any entities 
> when a query accesses them. Thus it has not possible to know when a table was 
> last accessed.
> Hive does provide a configurable hook to HS2 that is execcuted as a pre-query 
> hook prior to the query being executed. However, this hook is inefficient 
> because for each table or partition it is attempting to update time for, it 
> executes an "alter table ... " command internally. This is bad 
> 1) For a query touching 1000's of partitions, this hook takes forever to 
> update them.
> 2) Meanwhile, it is holding up the original query from executing.
> So even though we do not recommend using the hook, because the reward is too 
> little (having lastAccessTime updated), we realize there is no other means to 
> achieve this.
> Also, we can improve the performance of the hook significantly by adding a 
> new thrift API on HMS to update the lastAccessTime on the database rows 
> directly instead of going to HMS front end for 1 entity at time (leading to 
> 1000's of HMS calls that lead to multiple 1000's of calls to the database).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to