[jira] [Comment Edited] (IMPALA-10193) Limit the memory usage of the whole mini-cluster

Fifteen (Jira) Mon, 28 Sep 2020 23:11:42 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203666#comment-17203666
 ]


Fifteen edited comment on IMPALA-10193 at 9/29/20, 6:10 AM:
------------------------------------------------------------

[~tarmstrong] Yeah, actually I have fixed it locally and it works fine. But I 
am not very sure whether there is any factors beyond my consideration. To 
address the problem of wrong memory limit in start up options,

Here is my fix:

 
 # Add a new environment variable *MAX_MEM_GB* , it denotes the MAX mem 
available for both mini-cluster and CDH cluster.
 # When starting 'impalad', the algorithm take *MAX_MEM_GB* rather than 
*sys_mem* into account. 
 # When starting 'yarn node manager', similarly, *MAX_MEM_GB* substitutes for 
*sys_mem*

 

Implementations:

 

1. in file 'bin/impala-config.sh', I added a new environment variable
{code:java}
# Maximum memory available for mini-cluster and CDH cluster
export MAX_MEM_GB=28
{code}
2. in file  'bin/start-impala-cluster.py', I made the local variable 
`available_mem` equal to the MAX_MEM_GB if it's set. Otherwise it equal to the 
`sys_mem` in hedge of changing the default routine. The final mem_limit remains 
to be 0.7 * available_mem / cluster_size in this case.  
{code:java}
def compute_impalad_mem_limit(cluster_size):
  # Set mem_limit of each impalad to the smaller of 12GB or
  # 1/cluster_size (typically 1/3) of 70% of available memory.
  #
  # The default memory limit for an impalad is 80% of the total system memory. 
On a
  # mini-cluster with 3 impalads that means 240%. Since having an impalad be 
OOM killed
  # is very annoying, the mem limit will be reduced. This can be overridden 
using the
  # --impalad_args flag. virtual_memory().total returns the total physical 
memory.
  # The exact ratio to use is somewhat arbitrary. Peak memory usage during
  # tests depends on the concurrency of parallel tests as well as their 
ordering.
  # On the other hand, to avoid using too much memory, we limit the
  # memory choice here to max out at 12GB. This should be sufficient for tests.
  #
  # Beware that ASAN builds use more memory than regular builds.
  physical_mem_gb = psutil.virtual_memory().total / 1024 / 1024 / 1024
  available_mem = int(os.getenv("MAX_MEM_GB", str(physical_mem_gb)))
  mem_limit = int(0.7 * available_mem * 1024 * 1024 * 1024 / cluster_size)
  print("mem_limit" + str(mem_limit))
  return min(12 * 1024 * 1024 * 1024, mem_limit)
{code}
3. in file 
'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py 
b/testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py', 
Similarly, a 'available_ram_gb' is added, and all of the other computation 
logic remains identical.
{code:java}
def _get_yarn_nm_ram_mb():
  sys_ram = _get_system_ram_mb()
  available_ram_gb = int(os.getenv("MAX_MEM_GB", str(sys_ram / 1024)))
  # Fit into the following envelope:
  # - need 4GB at a bare minimum
  # - leave at least 24G for other services
  # - don't need more than 48G
  ret = min(max(available_ram_gb * 1024 - 24 * 1024, 4096), 48 * 1024)
  print >>sys.stderr, "Configuring Yarn NM to use {0}MB RAM".format(ret)
  return ret
{code}
 

I am still testing this fix in my 32GB docker container which runs on a 128GB 
physical machine. 


was (Author: fifteencai):
[~tarmstrong] Yeah, actually I have fixed it locally and it works fine. But I 
am not very sure whether there is any factors beyond my consideration. To 
address the problem of wrong memory limit in start up options,

 

Here is my fix:

 
 # Add a new environment variable *MAX_MEM_GB* , it denotes the MAX mem 
available for both mini-cluster and CDH cluster.
 # When starting 'impalad', the algorithm take *MAX_MEM_GB* rather than 
*sys_mem* into account. 
 # When starting 'yarn node manager', similarly, *MAX_MEM_GB* substitutes for 
*sys_mem*

 

Implementations:

 

1. in file 'bin/impala-config.sh', I added a new environment variable
{code:java}
# Maximum memory available for mini-cluster and CDH cluster
export MAX_MEM_GB=28
{code}
2. in file  'bin/start-impala-cluster.py', I made the local variable 
`available_mem` equal to the MAX_MEM_GB if it's set. Otherwise it equal to the 
`sys_mem` in hedge of changing the default routine. The final mem_limit remains 
to be 0.7 * available_mem / cluster_size in this case.  
{code:java}
def compute_impalad_mem_limit(cluster_size):
  # Set mem_limit of each impalad to the smaller of 12GB or
  # 1/cluster_size (typically 1/3) of 70% of available memory.
  #
  # The default memory limit for an impalad is 80% of the total system memory. 
On a
  # mini-cluster with 3 impalads that means 240%. Since having an impalad be 
OOM killed
  # is very annoying, the mem limit will be reduced. This can be overridden 
using the
  # --impalad_args flag. virtual_memory().total returns the total physical 
memory.
  # The exact ratio to use is somewhat arbitrary. Peak memory usage during
  # tests depends on the concurrency of parallel tests as well as their 
ordering.
  # On the other hand, to avoid using too much memory, we limit the
  # memory choice here to max out at 12GB. This should be sufficient for tests.
  #
  # Beware that ASAN builds use more memory than regular builds.
  physical_mem_gb = psutil.virtual_memory().total / 1024 / 1024 / 1024
  available_mem = int(os.getenv("MAX_MEM_GB", str(physical_mem_gb)))
  mem_limit = int(0.7 * available_mem * 1024 * 1024 * 1024 / cluster_size)
  print("mem_limit" + str(mem_limit))
  return min(12 * 1024 * 1024 * 1024, mem_limit)
{code}
3. in file 
'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py 
b/testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py', 
Similarly, a 'available_ram_gb' is added, and all of the other computation 
logic remains identical.
{code:java}
def _get_yarn_nm_ram_mb():
  sys_ram = _get_system_ram_mb()
  available_ram_gb = int(os.getenv("MAX_MEM_GB", str(sys_ram / 1024)))
  # Fit into the following envelope:
  # - need 4GB at a bare minimum
  # - leave at least 24G for other services
  # - don't need more than 48G
  ret = min(max(available_ram_gb * 1024 - 24 * 1024, 4096), 48 * 1024)
  print >>sys.stderr, "Configuring Yarn NM to use {0}MB RAM".format(ret)
  return ret
{code}
 

I am still testing this fix in my 32GB docker container which runs on a 128GB 
physical machine. 

> Limit the memory usage of the whole mini-cluster
> ------------------------------------------------
>
>                 Key: IMPALA-10193
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10193
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 3.4.0
>            Reporter: Fifteen
>            Priority: Minor
>         Attachments: image-2020-09-28-17-18-15-358.png
>
>
> The mini-cluster contains 3 virtual nodes, and all of them runs in a single 
> 'Machine'. By quoting, it implies the machine can be a docker container. If 
> the container is started with `-priviledged` and the actual memory is limited 
> by CGROUPS, then the total memory in `top` and the actual available memory 
> can be different! 
>  
> For example, in the container below, `top` tells us the total memory is 
> 128GB, while the total memory set in CGROUPS is actually 32GB. If the acutal 
> mem usage exceeds 32GB, process (such as impalad, hivemaster2 etc.) get 
> killed.
>   !image-2020-09-28-17-18-15-358.png!
>  
> So we may need a way to limit the whole mini-cluster's memory usage.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (IMPALA-10193) Limit the memory usage of the whole mini-cluster

Reply via email to