-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30262/
-----------------------------------------------------------
(Updated Jan. 30, 2015, 1:39 p.m.)
Review request for pig, liyun zhang and Praveen R.
Changes
-------
Updated description
Bugs: PIG-4393
https://issues.apache.org/jira/browse/PIG-4393
Repository: pig-git
Description (updated)
-------
PIG-4393 : Add stats and error reporting for Spark
After Pig submits a job to Spark cluster, we need to report job progress, spark
specific stats and any error logs back to the user.
(1) It adds getting back status of basic success/failure for each Spark job.
(2) It adds logging of Spark specific stats in log file. Essentially, registers
a job metrics listener with spark context and collects spark task level
metrics and aggregates.
(3) It also re-factors code to correctly populate PigStats, which is used by
most unit tests. This should fix a bunch of unit tests.
TODO items in a follow-up patch:
- Add #records to OutputStats for each job.
- Though StatsReportListener prints spark job progress in the logs, we also
probably need to implement PigProgressNotificationListener for spark.
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java
db152b5003ce6e79b001b2624010b91cc0f921d8
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
b15994d525250bbb26f7b7126dae619b9da363c8
src/org/apache/pig/tools/pigstats/SparkStats.java
fd45dd4f0be415dd48d9fb7381c57c861bbbf7ce
src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java PRE-CREATION
Diff: https://reviews.apache.org/r/30262/diff/
Testing (updated)
-------
Tested with unit tests, at least some unit tests that were failing
eariler due to lack of stats, like TestToolsPigServer and TestSplitStore
now pass.
Example of Spark Job metrics that appear in logs:
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - Spark Job [0] Metrics
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats -
EexcutorDeserializeTime : 74
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ExecutorRunTime : 538
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ResultSize : 2535
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - JvmGCTime : 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats -
ResultSerializationTime : 1
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - MemoryBytesSpilled : 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - DiskBytesSpilled : 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBlocksFetched
: 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - LocalBlocksFetched :
2
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - TotalBlocksFetched :
2
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - FetchWaitTime : 0
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBytesRead : 0
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleBytesWritten
: 918
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleWriteTime :
67000
Thanks,
Mohit Sabharwal