[
https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631713#comment-16631713
]
Peikai Zheng commented on IMPALA-7627:
--------------------------------------
[~tlipcon] [~bharathv] This patch IMPALA-7320 makes a huge improvement when
fetching the permission of each partition.
Here's the experiment result based on the commit
ba27b038148f0694662c14710f18ee6e94cf82b7.
||Average Time(Batch Pre-fetching of File Status)|| phase 1||phase 2||phase 3||
|idm.sauron_message|10.317641|1.2704156|101.7967435|
|default.revenue_enriched|11.5723206|0.6539499|100.0636804|
|default.upp_raw_prod|1.1659253|0.2458183|44.3217171|
|default.hit_to_beacon_playback_prod|1.0809608|0.2778011|55.9334541|
|default.sitetracking_enriched|12.9730505|0.3944841|127.8284113|
|default.player_custom_event|9.5859469|1.1089169|174.2167089|
|default.revenue_day_est|59.3265484|5.3537826|25.0874086|
> Parallel the fetching permission process
> ----------------------------------------
>
> Key: IMPALA-7627
> URL: https://issues.apache.org/jira/browse/IMPALA-7627
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Peikai Zheng
> Assignee: Peikai Zheng
> Priority: Major
>
> There are three phases when the Catalogd loading the metadata of a table.
> Firstly, the Catalogd fetches the metadata from Hive metastore;
> Then, the Catalogd fetches the permission of each partition from HDFS
> NameNode;
> Finally, the Catalogd loads the file descriptor from HDFS NameNode.
> According to my test result(Based on commit
> *11554a17c75b242767d5a50d66bc2874aa545c77*):
> ||Average Time(GetFileInfoThread=10)||phase 1||phase 2||phase 3||
> |idm.sauron_message|9.9917115|459.2106944|95.0179163|
> |default.revenue_enriched|12.3377474|111.2969046|40.827472|
> |default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
> |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
> |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
> |default.player_custom_event|9.2618705|493.4865302|116.4986184|
> |default.revenue_day_est|57.9116561|106.5028664|24.005822|
> Detailed Information of tables:
> ||Table||#Partitions||#Files||Size(without replica) / TB||Size(with replica)
> / TB||
> |idm.sauron_message|12923|69537|44.4|90.3|
> |default.revenue_enriched|1809|1832001|145.5|308.6|
> |default.upp_raw_prod|801|480000|186.3|424|
> |default.hit_to_beacon_playback_prod|777|793900|46.6|139.9|
> |default.sitetracking_enriched|1809|1842049|21.7|65|
> |default.player_custom_event|8816|2197096|47.2|141.5|
> |default.revenue_day_est|1731|109815|25.9|77.6|
> So, I suggest to parallel the second phase.The majority of the time occupied
> by the second phase.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]