Hi, As for #1 and #2, seems it is hard to catch remote/local fetching time because they are overlapped with each other: See `ShuffleBlockFetcherIterator`. IMO the current metric there (catching block time to fetch data from a queue) is kind of enough for most of users because remote fetching could be a bottleneck in case the metric gets worse. Any benefit to handle respective time, remote and local?
// maropu On Thu, Apr 21, 2016 at 2:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Interesting. > > For #3: > > bq. reading data from, > > I guess you meant reading from disk. > > On Wed, Apr 20, 2016 at 10:45 AM, atootoonchian <a...@levyx.com> wrote: > >> Current spark logging mechanism can be improved by adding the following >> parameters. It will help in understanding system bottlenecks and provide >> useful guidelines for Spark application developer to design an optimized >> application. >> >> 1. Shuffle Read Local Time: Time for a task to read shuffle data from >> local >> storage. >> 2. Shuffle Read Remote Time: Time for a task to read shuffle data from >> remote node. >> 3. Distribution processing time between computation, I/O, network: Show >> distribution of processing time of each task between computation, reading >> data from, and reading data from network. >> 4. Average I/O bandwidth: Average time of I/O throughput for each task >> when >> it fetches data from disk. >> 5. Average Network bandwidth: Average network throughput for each task >> when >> it fetches data from remote nodes. >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> > -- --- Takeshi Yamamuro