Laszlo Gaal created IMPALA-10669:
------------------------------------

             Summary: Loading nested ORC data is flaky during Docker-based tests
                 Key: IMPALA-10669
                 URL: https://issues.apache.org/jira/browse/IMPALA-10669
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 4.0
            Reporter: Laszlo Gaal
            Assignee: Laszlo Gaal


Docker-based tests (using {{docker/test-wirh-docker.py}} often fail in the 
dataload phase when trying to load ORC tables with complex types. The failure 
happens quite often (at least in about 50% of the runs), and when it happens, 
the failure pattern is quite consistent: it is always a Tez container 
overrunning its allotted memory.
The signature is:
{code}
2021-04-18 13:32:19.551921 [2021-04-18 13:31:51.355]Container killed on 
request. Exit code is 143
2021-04-18 13:32:19.551966 [2021-04-18 13:31:51.356]Container exited with a 
non-zero exit code 143. 
2021-04-18 13:32:19.552181 ]], TaskAttempt 1 failed, info=[Container 
container_1618776748992_0039_01_000003 finished with diagnostics set to 
[Container failed, exitCode=-104. [2021-04-18 13:32:00.379]Container 
[pid=11530,containerID=container_1618776748992_0039_01_000003] is running 
2785280B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
2021-04-18 13:32:19.552224 Dump of the process-tree for 
container_1618776748992_0039_01_000003 :
2021-04-18 13:32:19.552298      |- PID PPID PGRPID SESSID CMD_NAME 
USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) 
RSSMEM_USAGE(PAGES) FULL_CMD_LINE
2021-04-18 13:32:19.552753      |- 11540 11530 11530 11530 (java) 2048 85 
2761297920 262152 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java -Xmx819m 
-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003
 -Dtez.root.logger=INFO,CLA 
-Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp
 org.apache.tez.runtime.task.TezChild localhost 38999 
container_1618776748992_0039_01_000003 application_1618776748992_0039 1 
2021-04-18 13:32:19.553375      |- 11530 11528 11530 11530 (bash) 0 0 10010624 
672 /bin/bash -c 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java  -Xmx819m 
-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003
 -Dtez.root.logger=INFO,CLA  
-Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp
 org.apache.tez.runtime.task.TezChild localhost 38999 
container_1618776748992_0039_01_000003 application_1618776748992_0039 1 
1>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stdout
 
2>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stderr
  
{code}

The failure has only been seen on AWS m5.12xl instances so far, which have 
192GB of RAM, all of which is available to the initial container doing the 
compile/link and dataload phases of a test run.
The same code runs with no problems on m5.4xl (64GB RAM) and r5.4xl (128GB RAM) 
instances during other build jobs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to