CodingCat commented on PR #2358:
URL: 
https://github.com/apache/incubator-celeborn/pull/2358#issuecomment-1998929576

   imagine such a case, you're an engineer who is using Spark in a daily basis 
but not necessarily understand internals of Spark/Celeborn (I would say in my 
career life, 99% of users is in this category), when you find your spark tasks 
spending 90% of time in shuffle writing , what will you do ?
   
   * tune celeborn buffer size? (you probably need to know what celeborn is 
first and then about its buffer, most of users won't go this way)
   * ask Spark platform engineer to tune the size? (without enough 
understanding of the applications' domain knowledge, as a platform engineer, I 
cannot make best decision about whether to increase from 64MB to 128MB or to 
512MB)
   
   but with this feature, it will take most of auto tuning task for 
users/platform engineer
   
   * if you saw in the log, it tweaks up the threshold a few times and stopped 
under/at the threshold without perf or application OOMed issue, it's fine
   * if you saw in the log, it tweaks up the threshold a few times and capped 
at the threshold and cannot go up , in this case, if you find performance still 
bad (and/or spark application OOMed), probably a good timing to increase your 
executor memory 
   
   I was not arguing this feature is perfect, it is not.... I was just saying 
that it is useful for some cases, we can develop new 
    improvement on top of it , e.g. AIMD buffering management like what I 
mentioned above, which may or may not make users happier 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to