[jira] [Updated] (SPARK-12514) Spark MetricsSystem can fill disks/cause OOMs when using GangliaSink

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:51:03 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-12514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-12514:
---------------------------------
    Labels: bulk-closed  (was: )

> Spark MetricsSystem can fill disks/cause OOMs when using GangliaSink
> --------------------------------------------------------------------
>
>                 Key: SPARK-12514
>                 URL: https://issues.apache.org/jira/browse/SPARK-12514
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.2
>            Reporter: Aaron Tokhy
>            Priority: Minor
>              Labels: bulk-closed
>
> The MetricsSystem implementation in Spark generates unique metric names for 
> each spark application that has been submitted (to a YARN cluster, for 
> example).  This can be problematic for certain metrics environments, like 
> Ganglia.
> This creates metric names that look like the following (for each submitted 
> application):
> application_1450753701508_0001.driver.ExecutorAllocationManager.executors.numberAllExecutors
>  
> On Spark clusters where thousands of applications are submitted, some metrics 
> will eventually cause Ganglia daemons to reach their memory limits (gmond), 
> or to run out of disk space (gmetad).  This is due to the fact that some 
> existing metrics systems do not expect new metric names to be generated in 
> the lifetime of a cluster.
> Ganglia as a spark metrics sink is one example of where the current 
> implementation can run into problems.  Each new set of metrics per 
> application introduces a new set of RRD files that are never deleted (round 
> robin databases) and metrics in gmetad/gmond, which can cause the gmond 
> aggregator's memory usage to bloat over time, and gmetad to generate new 
> round robin databases for every new set of metrics, per application.  These 
> round robin databases are permanent, so each new set of metrics will 
> introduce files that would never be cleaned up.
> So the MetricsSystem may need to account for metrics sinks that have problems 
> with the introduction of new metrics, and buildRegistryName would have to 
> behave differently in this case.
> https://github.com/apache/spark/blob/d83c2f9f0b08d6d5d369d9fae04cdb15448e7f0d/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L126



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-12514) Spark MetricsSystem can fill disks/cause OOMs when using GangliaSink

Reply via email to