Imran Rashid created SPARK-14168:
------------------------------------
Summary: Managed Memory Leak Msg Should Only Be a Warning
Key: SPARK-14168
URL: https://issues.apache.org/jira/browse/SPARK-14168
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.6.1
Reporter: Imran Rashid
Assignee: Imran Rashid
Priority: Minor
When a task is completed, executors check to see if all managed memory for the
task was correctly released, and logs an error when it wasn't. However, it
turns out its OK for there to be memory that wasn't released when an Iterator
isn't read to completion, eg., with {{rdd.take()}}. This results in a scary
error msg in the executor logs:
{noformat}
16/01/05 17:02:49 ERROR Executor: Managed memory leak detected; size = 16259594
bytes, TID = 24
{noformat}
Furthermore, if tasks fails for any reason, this msg is also triggered. This
can lead users to believe that the failure was from the memory leak, when the
root cause could be entirely different. Eg., the same error msg appears in
executor logs with this clearly broken user code run with {{spark-shell
--master 'local-cluster[2,2,1024]'}}
{code}
sc.parallelize(0 to 10000000, 2).map(x => x % 10000 ->
x).groupByKey.mapPartitions { it => throw new RuntimeException("user error!")
}.collect
{code}
We should downgrade the msg to a warning and link to a more detailed
explanation.
See https://issues.apache.org/jira/browse/SPARK-11293 for more reports from
users (and perhaps a true fix)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]