CodingCat commented on PR #2358:
URL:
https://github.com/apache/incubator-celeborn/pull/2358#issuecomment-1998929576
imagine such a case, you're an engineer who is using Spark in a daily basis
but not necessarily understand internals of Spark/Celeborn (I would say in my
career life, 99% of users is in this category), when you find your spark tasks
spending 90% of time in shuffle writing , what will you do ?
* tune celeborn buffer size? (you probably need to know what celeborn is
first and then about its buffer, most of users won't go this way)
* ask Spark platform engineer to tune the size? (without enough
understanding of the applications' domain knowledge, as a platform engineer, I
cannot make best decision about whether to increase from 64MB to 128MB or to
512MB)
but with this feature, it will take most of auto tuning task for
users/platform engineer
* if you saw in the log, it tweaks up the threshold a few times and stopped
under/at the threshold without perf or application OOMed issue, it's fine
* if you saw in the log, it tweaks up the threshold a few times and capped
at the threshold and cannot go up , in this case, if you find performance still
bad (and/or spark application OOMed), probably a good timing to increase your
executor memory
I was not arguing this feature is perfect, it is not.... I was just saying
that it is useful for some cases, we can develop new
improvement on top of it , e.g. AIMD buffering management like what I
mentioned above, which may or may not make users happier
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]