okumin opened a new pull request, #5478: URL: https://github.com/apache/hive/pull/5478
### What changes were proposed in this pull request? Let each operator know how much it can use at maximum. ### Why are the changes needed? We observed OOM happens when SharedWorkOptimizer merges heavy operators such as GroupByOperators with a map-side hash aggregation. I guess it is hard for GroupByOperator to control its memory correctly in that case. I confirmed Tez has a similar feature to reallocate memory when a task is connected to multiple edges. https://github.com/apache/tez/blob/rel/release-0.10.4/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/resources/WeightedScalingMemoryDistributor.java https://issues.apache.org/jira/browse/HIVE-28548 I also found it is still problematic when # of merged operators is huge, e.g. 100, because the assignment per operator gets tiny. I will handle such case in HIVE-28549. ### Does this PR introduce _any_ user-facing change? No. ### Is the change a dependency upgrade? No. ### How was this patch tested? Checked local logs. ``` --! qt:dataset:src set hive.auto.convert.join=true; SELECT * FROM (SELECT key, count(*) AS num FROM src WHERE key LIKE '%1%' GROUP BY key) t1 LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%2%' GROUP BY key) t2 ON t1.key = t2.key LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%3%' GROUP BY key) t3 ON t1.key = t3.key LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%4%' GROUP BY key) t4 ON t1.key = t4.key LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%5%' GROUP BY key) t5 ON t1.key = t5.key; set hive.vectorized.execution.enabled=false; SELECT * FROM (SELECT key, count(*) AS num FROM src WHERE key LIKE '%1%' GROUP BY key) t1 LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%2%' GROUP BY key) t2 ON t1.key = t2.key LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%3%' GROUP BY key) t3 ON t1.key = t3.key LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%4%' GROUP BY key) t4 ON t1.key = t4.key LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%5%' GROUP BY key) t5 ON t1.key = t5.key; ``` The total memory reduced from 358MB to 71MB. ``` % cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | grep OperatorUtils | grep Assigning 2024-09-30T23:54:38,486 INFO [TezTR-672468_1_3_0_0_0] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators 2024-09-30T23:54:38,831 INFO [TezTR-672468_1_3_0_0_1] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators 2024-09-30T23:54:42,015 INFO [TezTR-672468_1_4_0_0_0] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators 2024-09-30T23:54:42,310 INFO [TezTR-672468_1_4_0_0_1] exec.OperatorUtils: Assigning 75161926 bytes to 5 operators ``` ``` % cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | grep GroupByOperator | grep 'Max hash table' 2024-09-30T23:54:36,280 INFO [TezTR-672468_1_2_0_0_0] exec.GroupByOperator: Max hash table memory: 179.20MB (358.40MB * 0.5) 2024-09-30T23:54:42,015 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5) 2024-09-30T23:54:42,017 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5) 2024-09-30T23:54:42,018 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5) 2024-09-30T23:54:42,018 INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: Max hash table memory: 35.84MB (71.68MB * 0.5) ... ``` ``` % cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | grep VectorGroupByOperator | grep 'GBY memory limits' 2024-09-30T23:54:38,491 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144) 2024-09-30T23:54:38,499 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144) 2024-09-30T23:54:38,500 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144) 2024-09-30T23:54:38,500 INFO [TezTR-672468_1_3_0_0_0] vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144) ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org