[PR] HIVE-28548: Subdivide memory size allocated to parallel operators [hive]

via GitHub Sun, 09 Feb 2025 05:20:23 -0800


okumin opened a new pull request, #5478:
URL: https://github.com/apache/hive/pull/5478


   ### What changes were proposed in this pull request?
   
   Let each operator know how much it can use at maximum.
   
   ### Why are the changes needed?
   
   We observed OOM happens when SharedWorkOptimizer merges heavy operators such 
as GroupByOperators with a map-side hash aggregation. I guess it is hard for 
GroupByOperator to control its memory correctly in that case.
   
   I confirmed Tez has a similar feature to reallocate memory when a task is 
connected to multiple edges.
   
https://github.com/apache/tez/blob/rel/release-0.10.4/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/resources/WeightedScalingMemoryDistributor.java
   
   https://issues.apache.org/jira/browse/HIVE-28548
   
   I also found it is still problematic when # of merged operators is huge, 
e.g. 100, because the assignment per operator gets tiny. I will handle such 
case in HIVE-28549.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### Is the change a dependency upgrade?
   
   No.
   
   ### How was this patch tested?
   Checked local logs.
   
   ```
   --! qt:dataset:src
   
   set hive.auto.convert.join=true;
   
   SELECT *
   FROM (SELECT key, count(*) AS num FROM src WHERE key LIKE '%1%' GROUP BY 
key) t1
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%2%' 
GROUP BY key) t2 ON t1.key = t2.key
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%3%' 
GROUP BY key) t3 ON t1.key = t3.key
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%4%' 
GROUP BY key) t4 ON t1.key = t4.key
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%5%' 
GROUP BY key) t5 ON t1.key = t5.key;
   
   set hive.vectorized.execution.enabled=false;
   
   SELECT *
   FROM (SELECT key, count(*) AS num FROM src WHERE key LIKE '%1%' GROUP BY 
key) t1
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%2%' 
GROUP BY key) t2 ON t1.key = t2.key
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%3%' 
GROUP BY key) t3 ON t1.key = t3.key
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%4%' 
GROUP BY key) t4 ON t1.key = t4.key
   LEFT OUTER JOIN (SELECT key, count(*) AS num FROM src WHERE key LIKE '%5%' 
GROUP BY key) t5 ON t1.key = t5.key;
   ```
   
   The total memory reduced from 358MB to 71MB.
   
   ```
   % cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | 
grep OperatorUtils | grep Assigning
   2024-09-30T23:54:38,486  INFO [TezTR-672468_1_3_0_0_0] exec.OperatorUtils: 
Assigning 75161926 bytes to 5 operators
   2024-09-30T23:54:38,831  INFO [TezTR-672468_1_3_0_0_1] exec.OperatorUtils: 
Assigning 75161926 bytes to 5 operators
   2024-09-30T23:54:42,015  INFO [TezTR-672468_1_4_0_0_0] exec.OperatorUtils: 
Assigning 75161926 bytes to 5 operators
   2024-09-30T23:54:42,310  INFO [TezTR-672468_1_4_0_0_1] exec.OperatorUtils: 
Assigning 75161926 bytes to 5 operators
   ```
   
   ```
   % cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | 
grep GroupByOperator | grep 'Max hash table'
   2024-09-30T23:54:36,280  INFO [TezTR-672468_1_2_0_0_0] exec.GroupByOperator: 
Max hash table memory: 179.20MB (358.40MB * 0.5)
   2024-09-30T23:54:42,015  INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: 
Max hash table memory: 35.84MB (71.68MB * 0.5)
   2024-09-30T23:54:42,017  INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: 
Max hash table memory: 35.84MB (71.68MB * 0.5)
   2024-09-30T23:54:42,018  INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: 
Max hash table memory: 35.84MB (71.68MB * 0.5)
   2024-09-30T23:54:42,018  INFO [TezTR-672468_1_4_0_0_0] exec.GroupByOperator: 
Max hash table memory: 35.84MB (71.68MB * 0.5)
   ...
   ```
   
   ```
   % cat org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver-output.txt | 
grep VectorGroupByOperator | grep 'GBY memory limits'
   2024-09-30T23:54:38,491  INFO [TezTR-672468_1_3_0_0_0] 
vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true 
maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
   2024-09-30T23:54:38,499  INFO [TezTR-672468_1_3_0_0_0] 
vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true 
maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
   2024-09-30T23:54:38,500  INFO [TezTR-672468_1_3_0_0_0] 
vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true 
maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
   2024-09-30T23:54:38,500  INFO [TezTR-672468_1_3_0_0_0] 
vector.VectorGroupByOperator: GBY memory limits - isTez: true isLlap: true 
maxHashTblMemory: 64.51MB (71.68MB * 0.9) fixSize:660 (key:472 agg:144)
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

[PR] HIVE-28548: Subdivide memory size allocated to parallel operators [hive]

Reply via email to