-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30262/
-----------------------------------------------------------

(Updated Jan. 30, 2015, 1:39 p.m.)


Review request for pig, liyun zhang and Praveen R.


Changes
-------

Updated description


Bugs: PIG-4393
    https://issues.apache.org/jira/browse/PIG-4393


Repository: pig-git


Description (updated)
-------

PIG-4393 : Add stats and error reporting for Spark

After Pig submits a job to Spark cluster, we need to report job progress, spark 
specific stats and any error logs back to the user.

(1) It adds getting back status of basic success/failure for each Spark job. 
(2) It adds logging of Spark specific stats in log file. Essentially, registers 
a job metrics listener with spark context and collects spark  task level 
metrics and aggregates.
(3) It also re-factors code to correctly populate PigStats, which is used by 
most unit tests. This should fix a bunch of unit tests.

TODO items in a follow-up patch:
 - Add #records to OutputStats for each job.
 - Though StatsReportListener prints spark job progress in the logs, we also 
probably need to implement PigProgressNotificationListener for spark.


Diffs
-----

  
src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java 
PRE-CREATION 
  
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java
 db152b5003ce6e79b001b2624010b91cc0f921d8 
  src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java 
b15994d525250bbb26f7b7126dae619b9da363c8 
  src/org/apache/pig/tools/pigstats/SparkStats.java 
fd45dd4f0be415dd48d9fb7381c57c861bbbf7ce 
  src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java PRE-CREATION 
  src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java PRE-CREATION 
  src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java PRE-CREATION 

Diff: https://reviews.apache.org/r/30262/diff/


Testing (updated)
-------

Tested with unit tests, at least some unit tests that were failing
eariler due to lack of stats, like TestToolsPigServer and TestSplitStore 
now pass.


Example of Spark Job metrics that appear in logs:

2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  - Spark Job [0] Metrics
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       
EexcutorDeserializeTime : 74
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       ExecutorRunTime : 538
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       ResultSize : 2535
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       JvmGCTime : 0
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       
ResultSerializationTime : 1
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       MemoryBytesSpilled : 0
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       DiskBytesSpilled : 0
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       RemoteBlocksFetched 
: 0
2015-01-29 23:06:42,520 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       LocalBlocksFetched : 
2
2015-01-29 23:06:42,521 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       TotalBlocksFetched : 
2
2015-01-29 23:06:42,521 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       FetchWaitTime : 0
2015-01-29 23:06:42,521 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       RemoteBytesRead : 0
2015-01-29 23:06:42,521 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       ShuffleBytesWritten 
: 918
2015-01-29 23:06:42,521 [main] INFO  
org.apache.pig.tools.pigstats.spark.SparkPigStats  -       ShuffleWriteTime : 
67000


Thanks,

Mohit Sabharwal

Reply via email to