[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030047#comment-14030047
 ] 

Chris Drome commented on HIVE-7195:
-----------------------------------

We ([~mithun], [~thiruvel], [~selinazh]) have done some work in this area for 
hive-0.12.

Some of the improvements include:

1) Disabling the datanucleus cache to reduce the memory usage in the metastore.
2) Actively close datanucleus query-related resources to allow the memory the 
be reclaimed.
3) Optimizations to answer metadata-only queries directly from the metastore 
without launching MR jobs.
4) Optimizations to direct SQL statements.
5) Schema changes to speed up DROP TABLE statements.
6) Added client and server side parameters to restrict the maximum number of 
partitions that can be retrieved.

We are currently looking into:

1) Reducing the client time required to retrieve HDFS file information.
2) Using light-weight partition objects where possible to reduce the time and 
memory on client/server.

If I've forgotten anything Mithun, Thiruvel, or Selina can add more information.

> Improve Metastore performance
> -----------------------------
>
>                 Key: HIVE-7195
>                 URL: https://issues.apache.org/jira/browse/HIVE-7195
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Priority: Critical
>
> Even with direct SQL, which significantly improves MS performance, some 
> operations take a considerable amount of time, when there are many partitions 
> on table. Specifically I believe the issue:
> * When a client gets all partitions we do not send them an iterator, we 
> create a collection of all data and then pass the object over the network in 
> total
> * Operations which require looking up data on the NN can still be slow since 
> there is no cache of information and it's done in a serial fashion
> * Perhaps a tangent, but our client timeout is quite dumb. The client will 
> timeout and the server has no idea the client is gone. We should use 
> deadlines, i.e. pass the timeout to the server so it can calculate that the 
> client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to