Current spark logging mechanism can be improved by adding the following parameters. It will help in understanding system bottlenecks and provide useful guidelines for Spark application developer to design an optimized application.
1. Shuffle Read Local Time: Time for a task to read shuffle data from local storage. 2. Shuffle Read Remote Time: Time for a task to read shuffle data from remote node. 3. Distribution processing time between computation, I/O, network: Show distribution of processing time of each task between computation, reading data from, and reading data from network. 4. Average I/O bandwidth: Average time of I/O throughput for each task when it fetches data from disk. 5. Average Network bandwidth: Average network throughput for each task when it fetches data from remote nodes. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org