-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30262/
-----------------------------------------------------------
(Updated Feb. 2, 2015, 8:47 p.m.)
Review request for pig, liyun zhang and Praveen R.
Changes
-------
Corresponding to each POStore, there could be multiple Spark jobs. For example:
StreamingConverter adds RDD action count(), which launches a separate job.
Updated patch to address this.
Bugs: PIG-4393
https://issues.apache.org/jira/browse/PIG-4393
Repository: pig-git
Description
-------
PIG-4393 : Add stats and error reporting for Spark
After Pig submits a job to Spark cluster, we need to report job progress, spark
specific stats and any error logs back to the user.
(1) It adds getting back status of basic success/failure for each Spark job.
(2) It adds logging of Spark specific stats in log file. Essentially, registers
a job metrics listener with spark context and collects spark task level
metrics and aggregates.
(3) It also re-factors code to correctly populate PigStats, which is used by
most unit tests. This should fix a bunch of unit tests.
TODO items in a follow-up patch:
- Add #records to OutputStats for each job.
- Though StatsReportListener prints spark job progress in the logs, we also
probably need to implement PigProgressNotificationListener for spark.
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java
db152b5
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
b15994d
src/org/apache/pig/tools/pigstats/SparkStats.java fd45dd4
src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java PRE-CREATION
Diff: https://reviews.apache.org/r/30262/diff/
Testing (updated)
-------
Tested with unit tests:
Compared to last Jenkins unit test run for the branch (baseline), two unit
tests TestToolsPigServer and TestStoreInstances are fixed.
Baseline:
https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/
Example of Spark Job metrics that appear in logs:
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - Spark Job [0] Metrics
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats -
EexcutorDeserializeTime : 74
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ExecutorRunTime : 538
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ResultSize : 2535
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - JvmGCTime : 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats -
ResultSerializationTime : 1
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - MemoryBytesSpilled : 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - DiskBytesSpilled : 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBlocksFetched
: 0
2015-01-29 23:06:42,520 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - LocalBlocksFetched :
2
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - TotalBlocksFetched :
2
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - FetchWaitTime : 0
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - RemoteBytesRead : 0
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleBytesWritten
: 918
2015-01-29 23:06:42,521 [main] INFO
org.apache.pig.tools.pigstats.spark.SparkPigStats - ShuffleWriteTime :
67000
Thanks,
Mohit Sabharwal