[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

Jason Dere (JIRA) Wed, 29 Jul 2015 18:07:15 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647008#comment-14647008
 ]


Jason Dere commented on HIVE-10319:
-----------------------------------

I think I see the reason for the change in the diff - THRIFT-2172 seems to have 
moved the "optionals" array from being an instance (per-object) field, to a 
class-level field.  As a result the object inspector created based on the 
Thrift object is now missing the "optionals" field, because you re-generated 
the Java files for megastruct.thrift which was used in this test. Simply 
re-generating with thrift 0.9.0 is fine. So would using Thrift 0.9.2, if you 
also had updated convert_enum_to_string.q.out.

The "optional" vs "required" inconsistency in the generated files looks like 
it's just a comment, so I think this difference is harmless.

> Hive CLI startup takes a long time with a large number of databases
> -------------------------------------------------------------------
>
>                 Key: HIVE-10319
>                 URL: https://issues.apache.org/jira/browse/HIVE-10319
>             Project: Hive
>          Issue Type: Improvement
>          Components: CLI
>    Affects Versions: 1.0.0
>            Reporter: Nezih Yigitbasi
>            Assignee: Nezih Yigitbasi
>         Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

Reply via email to