[ 
https://issues.apache.org/jira/browse/HIVE-24680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084045#comment-18084045
 ] 

Thomas Rebele edited comment on HIVE-24680 at 5/28/26 9:14 AM:
---------------------------------------------------------------

I did some investigations around unionDistinct_1.q:
 * When I run the test locally, the TezChild processes are created with 
-Xmx204m. If I'm not mistaken, I saw -Xmx409m for the TezChild processes for 
when debugging the test and the DAGAppMaster process. I was not able to 
reproduce this in later executions.
 * The TezChild processes have a heap usage of 150MB to 200MB. A big part of 
that is to store the "central directory" of jar files in 
java.util.zip.ZipFile$Source. E.g., 
~/.m2/repository/software/amazon/awssdk/bundle/2.29.52/bundle-2.29.52.jar has a 
size of 612MB, and its "central directory" is about 48MB, so the dependency on 
this library blocks 48MB of memory. The other dependency loading a big jar is 
aws-java-sdk-bundle.
 * I've found that increasing {{hive.tez.container.size}} in 
{{data/conf/llap/hive-site.xml}} increases the {{-Xmx}} value for the TezChild 
processes. With 512 I've seen TezChild processes with -Xmx409m. We could 
increase the value to check whether the OutOfMemoryError become less likely.


was (Author: thomas.rebele):
I did some investigations around unionDistinct_1.q:
 * When I run the test locally, the TezChild processes are created with 
-Xmx204m. If I'm not mistaken, I saw -Xmx409m for the TezChild processes for 
when debugging the test and the DAGAppMaster process. I was not able to 
reproduce this in later executions.
 * The TezChild processes have a heap usage of 150MB to 200MB. A big part of 
that is to store the "central directory" of jar files in 
java.util.zip.ZipFile$Source. E.g., 
~/.m2/repository/software/amazon/awssdk/bundle/2.29.52/bundle-2.29.52.jar has a 
size of 612MB, and its "central directory" is about 48MB, so the dependency on 
this library blocks 48MB of memory.
 * I've found that increasing {{hive.tez.container.size}} in 
{{data/conf/llap/hive-site.xml}} increases the {{-Xmx}} value for the TezChild 
processes. With 512 I've seen TezChild processes with -Xmx409m. We could 
increase the value to check whether the OutOfMemoryError become less likely.

> TestMiniLlapCliDriver unionDistinct_1.q is flaky
> ------------------------------------------------
>
>                 Key: HIVE-24680
>                 URL: https://issues.apache.org/jira/browse/HIVE-24680
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mustafa İman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/job/hive-flaky-check/166/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_unionDistinct_1_/]
> {code:java}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing unionDistinct_1.q 
> 176,178d175
> < 
> < 
> < 
> 206d202
> <   (
> 233d228
> < ]
> 4185d4179
> < ) a
> 4198d4191
> < ) aa
> 4490,4491d4482
> < #### A masked pattern was here ####
> < #### A masked pattern was here ####
> 4901d4891
> < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:9
> 5305d5294
> < FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 7, 
> vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 7] killed/failed 
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_#ID# [Map 7], java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
> 8892d8880
> < PREHOOK: Input: default@src
> 9131d9118
> < PREHOOK: Output: default@union_subq_union30
> 9277d9263
> < PREHOOK: query: insert overwrite table union_subq_union30 
> 9446d9431
> < PREHOOK: type: QUERY
> 10089,10090d10073
> < select * from (
> < select * from (
> 10214,10216d10196
> <     select key, value, count(1) from src group by key, value
> <     select key, value, count(1) from src group by key, value
> <   select key, value from 
> 10265,10266d10244
> <   select key, value from src 
> < select key, value from src
> 11928,11929d11905
> < Status: Failed
> <   ) subq
> 12336d12311
> <     UNION DISTINCT
> 12409d12383
> <   UNION DISTINCT 
> 12458d12431
> < UNION DISTINCT
> 12519,12529d12491
> < Vertex failed, vertexName=Map 7, vertexId=vertex_#ID#, diagnostics=[Vertex 
> vertex_#ID# [Map 7] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex 
> Input: src initializer failed, vertex=vertex_#ID# [Map 7], 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> < Vertex killed, vertexName=Map 11, vertexId=vertex_#ID#, diagnostics=[Vertex 
> received Kill in INITED state., Vertex vertex_#ID# [Map 11] killed/failed due 
> to:OTHER_VERTEX_FAILURE]
> < Vertex killed, vertexName=Map 13, vertexId=vertex_#ID#, diagnostics=[Vertex 
> received Kill in INITED state., Vertex vertex_#ID# [Map 13] killed/failed due 
> to:OTHER_VERTEX_FAILURE]
> < Vertex killed, vertexName=Map 1, vertexId=vertex_#ID#, diagnostics=[Vertex 
> received Kil
> Output was too long and had to be truncated... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to