Current spark logging mechanism can be improved by adding the following
parameters. It will help in understanding system bottlenecks and provide
useful guidelines for Spark application developer to design an optimized
application.

1. Shuffle Read Local Time: Time for a task to read shuffle data from local
storage.
2. Shuffle Read Remote Time: Time for a  task to read shuffle data from
remote node.
3. Distribution processing time between computation, I/O, network: Show
distribution of processing time of each task between computation, reading
data from, and reading data from network.
4. Average I/O bandwidth: Average time of I/O throughput for each task when
it fetches data from disk.
5. Average Network bandwidth: Average network throughput for each task when
it fetches data from remote nodes.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to