[
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646986#comment-14646986
]
Nezih Yigitbasi commented on HIVE-10319:
----------------------------------------
[~jdere] I updated the patch. Seems like thrift 0.9.2 generated a lot of crap,
with 0.9.0 the patch got way smaller. BTW I observed an inconsistency between
the metastore interface definition
([here|https://github.com/apache/hive/blob/master/metastore/if/hive_metastore.thrift#L666-670])
and the generated classes (see the corresponding generated class
[here|https://github.com/apache/hive/blob/master/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java#L629-636]).
The field partitionnames is required in the interface definition, but it's
optional in the generated class -- probably the generated classes haven't been
committed with that change. My latest patch fixes this inconsistency, but not
sure whether this may have a side effect or not.
> Hive CLI startup takes a long time with a large number of databases
> -------------------------------------------------------------------
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
> Issue Type: Improvement
> Components: CLI
> Affects Versions: 1.0.0
> Reporter: Nezih Yigitbasi
> Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch,
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of
> databases in the DW. I think the root cause is the way permanent UDFs are
> loaded from the metastore. When I looked at the logs and the source code I
> see that at startup Hive first gets all the databases from the metastore and
> then for each database it makes a metastore call to get the permanent
> functions for that database [see Hive.java |
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
> So the number of metastore calls made is in the order of the number of
> databases. In production we have several hundreds of databases so Hive makes
> several hundreds of RPC calls during startup, taking 30+ seconds.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)