[
https://issues.apache.org/jira/browse/PIG-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556958#comment-14556958
]
Rohini Palaniswamy commented on PIG-4555:
-----------------------------------------
bq. i end-up having my containers (the AM one) being killed because they use
too much virtual memory (about 17GB of virtual memory)
17GB is really bad. How much was the Xmx? What is the virtual memory without
NUMA?
bq. But for sure, in my case, setting -XX:+UseNUMA do trigger an OOM.
Are you sure it hits OOM or just the container being killed because of
yarn.nodemanager.vmem-pmem-ratio being breached?
bq. I'm pretty sure there is already some configuration variables one can set
in its tez-site.xml file to set this option so no need to have pig force this
setting by code. For what i understand, the real problem is not about
-XX!:+UseNUMA. The real problem is more that some option from the tez
configuration are ignored.
TEZ_AM_LAUNCH_CMD_OPTS_DEFAULT is "-XX:+PrintGCDetails -verbose:gc
-XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC" . i.e -XX:+UseNUMA is
part of default tez AM options. In Pig, we give preference to mapreduce AM
settings (if tez.am.launch.cmd-opts is not overriden in tez-site.xml) and
translate them to tez instead of using the mentioned tez defaults. Since the
mapreduce AM settings are always there from mapred-default.xml or
mapred-site.xml, -XX:+UseNUMA is never there. So this is about making use of
the default tez settings in Pig. If in a particular environment -XX:+UseNUMA
is problematic, it can be overriden in tez-site.xml.
The real issue of why Tez AM performed poorly without NUMA is still there and
will be tracked in TEZ jira. You have some concerns raised and I don't have
knowledgeable answers for them at this point. So moved this to 0.16 and will
add this after we actually fully understand more about the NUMA behavior and
what is happening with and without NUMA in Tez AM.
> Add -XX:+UseNUMA for Tez jobs
> -----------------------------
>
> Key: PIG-4555
> URL: https://issues.apache.org/jira/browse/PIG-4555
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
>
> For very big Tez jobs (~50K tasks), AM quickly goes OOM without
> -XX:+UseNUMA. tez.am.launch.cmd-opts default setting has that, but since pig
> gives preference to yarn.app.mapreduce.am.command-opts if present (which
> usually it is), -XX:+UseNUMA is not there. Need to add -XX:+UseNUMA if we
> are picking up mapreduce setting.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)