[
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908309#comment-14908309
]
Imran Rashid commented on SPARK-9103:
-------------------------------------
Hi [~liyezhang556520], thanks for posting the design doc. Looks good, just a
couple of questions.
1) Will the proposed design cover SPARK-9111, getting the memory when the
executor dies abnormally, (esp when killed by yarn)? It seems to me the answer
is "no", which is fine, that can be tackled separately, I just wanted to
clarify.
2) I see the complexity of having overlapping stages, but I wonder if it could
be simplified somewhat. It seems to me you just need to maintain a
{{executorToLatestMetrics: Map[executor, metrics]}}, and then on every stage
complete, you just log them all? Maybe this is what you are already describing
in the doc, but it seems like there is more state & a bit more logging going
on. Eg., I don't fully understand why you need to log both "CHB1" and "HB3" in
your example.
thanks
> Tracking spark's memory usage
> -----------------------------
>
> Key: SPARK-9103
> URL: https://issues.apache.org/jira/browse/SPARK-9103
> Project: Spark
> Issue Type: Umbrella
> Components: Spark Core, Web UI
> Reporter: Zhang, Liye
> Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on
> webUI) for the executors. User have no idea on what is the memory consumption
> when they are running spark applications with a lot of memory used in spark
> executors. Especially when they encounter the OOM, it’s really hard to know
> what is the cause of the problem. So it would be helpful to give out the
> detail memory consumption information for each part of spark, so that user
> can clearly have a picture of where the memory is exactly used.
> The memory usage info to expose should include but not limited to shuffle,
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly
> for debugging and tuning.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]