[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646986#comment-14646986
 ] 

Nezih Yigitbasi commented on HIVE-10319:
----------------------------------------

[~jdere] I updated the patch. Seems like thrift 0.9.2 generated a lot of crap, 
with 0.9.0 the patch got way smaller. BTW I observed an inconsistency between 
the metastore interface definition 
([here|https://github.com/apache/hive/blob/master/metastore/if/hive_metastore.thrift#L666-670])
 and the generated classes (see the corresponding generated class 
[here|https://github.com/apache/hive/blob/master/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java#L629-636]).
 The field partitionnames is required in the interface definition, but it's 
optional in the generated class -- probably the generated classes haven't been 
committed with that change. My latest patch fixes this inconsistency, but not 
sure whether this may have a side effect or not.

> Hive CLI startup takes a long time with a large number of databases
> -------------------------------------------------------------------
>
>                 Key: HIVE-10319
>                 URL: https://issues.apache.org/jira/browse/HIVE-10319
>             Project: Hive
>          Issue Type: Improvement
>          Components: CLI
>    Affects Versions: 1.0.0
>            Reporter: Nezih Yigitbasi
>            Assignee: Nezih Yigitbasi
>         Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to