[ https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629301#comment-16629301 ]
Vuk Ercegovac commented on IMPALA-7627: --------------------------------------- Pls include the version (or githash) of Impala from which these measurements were obtained. Also, would be useful to have units on those measurements as well as number of partitions/files. > Parallel the fetching permission process > ---------------------------------------- > > Key: IMPALA-7627 > URL: https://issues.apache.org/jira/browse/IMPALA-7627 > Project: IMPALA > Issue Type: Improvement > Reporter: Peikai Zheng > Assignee: Peikai Zheng > Priority: Major > > There are three phases when the Catalogd loading the metadata of a table. > Firstly, the Catalogd fetches the metadata from Hive metastore; > Then, the Catalogd fetches the permission of each partition from HDFS > NameNode; > Finally, the Catalogd loads the file descriptor from HDFS NameNode. > According to my test result: > ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3|| > > |idm.sauron_message|9.9917115|459.2106944|95.0179163| > |default.revenue_enriched|12.3377474|111.2969046|40.827472| > |default.upp_raw_prod|1.5143162|50.0251426|12.6805323| > |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858| > |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032| > |default.player_custom_event|9.2618705|493.4865302|116.4986184| > |default.revenue_day_est|57.9116561|106.5028664|24.005822| > The majority of the time occupied by the second phase. > So, I suggest to parallel the second phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org