[
https://issues.apache.org/jira/browse/IMPALA-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556513#comment-17556513
]
ASF subversion and git services commented on IMPALA-10316:
----------------------------------------------------------
Commit 70568c80b3bb19e1945896d0a9492b8bc8f37164 in impala's branch
refs/heads/master from Laszlo Gaal
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=70568c80b ]
IMPALA-10316: Increase Yarn minimum container size for dataload
This is an attempt to get rod of IMPALA-10669 and friends, crashing Tez
containers during the loading of nested ORC data.
The usual error message logged for these failures is:
Container [pid=11530,containerID=container_1618776748992_0039_01_000003]
is running 2785280B beyond the 'PHYSICAL' memory limit.
Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB
virtual memory used. Killing container.
https://stackoverflow.com/a/43827548/143681 explains that the tunable
setting 'yarn.scheduler.minimum-allocation-mb' in yarn-site.xml sets
both the minimum memory size and the memory size increment for Yarn
containers
This patch is an attempt to work around the failure by forcibly setting
a minimum size for the Yarn containers used in dataload that is
significantly larger than the 1 GB size reported in the failure messages.
Tested by running the dataload phase successfully on the following
platform combinations:
- Ubuntu 16.04, m6i.8xlarge (128 GB RAM, Docker)
- Ubuntu 16.04, m5.12xlarge (192 GB RAM, Docker)
- Centos 7.4, m5.4xlarge (64 GB RAM)
- Centos 7.4, r5.4xlarge (128 GB RAM)
- Ubuntu 16.04, m6i.4xlarge (64 GB RAM)
Change-Id: I77e7c9e9fa3491c6e5652351869d3a4410bbb7b8
Reviewed-on: http://gerrit.cloudera.org:8080/18630
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Laszlo Gaal (Cloudera) <[email protected]>
> load_nested.py failed due to out of memory during Jenkins GVO
> -------------------------------------------------------------
>
> Key: IMPALA-10316
> URL: https://issues.apache.org/jira/browse/IMPALA-10316
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Reporter: Zoltán Borók-Nagy
> Assignee: Michael Smith
> Priority: Critical
> Labels: broken-build, flaky
>
> The following job failed due to out of memory:
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12588] (please click
> on "Don't keep this build forever" once this issue is resolved)
> Relevant log lines:
> {noformat}
> 02:33:42 Loading nested orc data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-nested.log)...
> 02:35:39 FAILED (Took: 1 min 57 sec)
> 02:35:39 '/home/ubuntu/Impala/testdata/bin/load_nested.py -t
> tpch_nested_orc_def -f orc/def' failed. Tail of log:
> 02:35:39 2020-11-11 02:35:06,225 INFO:load_nested[348]:Executing:
> 02:35:39
> 02:35:39 CREATE EXTERNAL TABLE supplier
> 02:35:39 STORED AS orc
> 02:35:39 TBLPROPERTIES('orc.compress' =
> 'ZLIB','external.table.purge'='TRUE')
> 02:35:39 AS SELECT * FROM tmp_supplier
> 02:35:39 Traceback (most recent call last):
> 02:35:39 File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415,
> in <module>
> 02:35:39 load()
> 02:35:39 File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349,
> in load
> 02:35:39 hive.execute(stmt)
> 02:35:39 File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line
> 206, in execute
> 02:35:39 return self._cursor.execute(sql, *args, **kwargs)
> 02:35:39 File
> "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
> line 331, in execute
> 02:35:39 self._wait_to_finish() # make execute synchronous
> 02:35:39 File
> "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
> line 413, in _wait_to_finish
> 02:35:39 raise OperationalError(resp.errorMessage)
> 02:35:39 impala.error.OperationalError: Error while compiling statement:
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
> vertexId=vertex_1605060173780_0039_2_00, diagnostics=[Task failed,
> taskId=task_1605060173780_0039_2_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1605060173780_0039_01_000002 finished with
> diagnostics set to [Container failed, exitCode=-104. [2020-11-11
> 02:35:11.768]Container
> [pid=16810,containerID=container_1605060173780_0039_01_000002] is running
> 7729152B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB
> physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing
> container.{noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]