[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028562#comment-14028562
 ] 

Sergey Shelukhin commented on HIVE-7195:
----------------------------------------

Yeah, we were discussing this in Hadoop summit w/Chris and Selena (I hope I 
remembered the names right), and Alan. We can get rid of individual thrift 
partition objects and store them more efficiently.
Another thing we can do, together with that approach, is make sure APIs only 
populate things that are necessary, most places don't need full partition 
object in all its glory. The problem with that is that all parts of partition 
objects are necessary somewhere, so API will need to be augmented to explicitly 
say what is needed/not needed. 

> Improve Metastore performance
> -----------------------------
>
>                 Key: HIVE-7195
>                 URL: https://issues.apache.org/jira/browse/HIVE-7195
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Priority: Critical
>
> Even with direct SQL, which significantly improves MS performance, some 
> operations take a considerable amount of time, when there are many partitions 
> on table. Specifically I believe the issue:
> * When a client gets all partitions we do not send them an iterator, we 
> create a collection of all data and then pass the object over the network in 
> total
> * Operations which require looking up data on the NN can still be slow since 
> there is no cache of information and it's done in a serial fashion
> * Perhaps a tangent, but our client timeout is quite dumb. The client will 
> timeout and the server has no idea the client is gone. We should use 
> deadlines, i.e. pass the timeout to the server so it can calculate that the 
> client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to