[
https://issues.apache.org/jira/browse/HIVE-25941?focusedWorklogId=755214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755214
]
ASF GitHub Bot logged work on HIVE-25941:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Apr/22 13:06
Start Date: 11/Apr/22 13:06
Worklog Time Spent: 10m
Work Description: kasakrisz commented on code in PR #3014:
URL: https://github.com/apache/hive/pull/3014#discussion_r847305506
##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java:
##########
@@ -205,4 +212,52 @@ HiveRelOptMaterialization get(String dbName, String
viewName) {
public boolean isEmpty() {
return materializedViews.isEmpty();
}
+
+
+ private static class ASTKey {
+ private final ASTNode root;
+
+ public ASTKey(ASTNode root) {
+ this.root = root;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) return true;
+ if (o == null || getClass() != o.getClass()) return false;
+ ASTKey that = (ASTKey) o;
+ return equals(root, that.root);
+ }
+
+ private boolean equals(ASTNode astNode1, ASTNode astNode2) {
+ if (!(astNode1.getType() == astNode2.getType() &&
+ astNode1.getText().equals(astNode2.getText()) &&
+ astNode1.getChildCount() == astNode2.getChildCount())) {
+ return false;
+ }
+
+ for (int i = 0; i < astNode1.getChildCount(); ++i) {
+ if (!equals((ASTNode) astNode1.getChild(i), (ASTNode)
astNode2.getChild(i))) {
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ @Override
+ public int hashCode() {
+ return hashcode(root);
Review Comment:
* Hashcode of the ASTs stored in the `MaterializedViewCache` calculated only
once: when the MVs are loaded when hs2 starts or a new MV is created because
Java hashmap implementation caches the key's hashcode.
* When we look-up a Materialization the hashcode of the key is calculated
every time the get method is called. This is called only once for the entire
tree per query.
* To find sub-query rewrites the look-up is done by sub AST-s and the
hashcode is also calculated for the subTrees but when I did some performance
tests locally I didn't found this as a bottleneck.
This solution is still much faster then generating the expanded query text
of every possible sub-query using `UnparseTranslator` and `TokenRewriteStream`.
Issue Time Tracking
-------------------
Worklog Id: (was: 755214)
Time Spent: 1h 20m (was: 1h 10m)
> Long compilation time of complex query due to analysis for materialized view
> rewrite
> ------------------------------------------------------------------------------------
>
> Key: HIVE-25941
> URL: https://issues.apache.org/jira/browse/HIVE-25941
> Project: Hive
> Issue Type: Bug
> Components: Materialized views
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Attachments: sample.png
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> When compiling query the optimizer tries to rewrite the query plan or
> subtrees of the plan to use materialized view scans.
> If
> {code}
> set hive.materializedview.rewriting.sql.subquery=false;
> {code}
> the compilation succeed in less then 10 sec otherwise it takes several
> minutes (~ 5min) depending on the hardware.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)