-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30262/
-----------------------------------------------------------
(Updated Jan. 30, 2015, 7:02 a.m.)
Review request for pig, liyun zhang and Praveen R.
Changes
-------
Added #bytes to OutputStats
Also added "bytesWritten" to spark stats.
Bugs: PIG-4393
https://issues.apache.org/jira/browse/PIG-4393
Repository: pig-git
Description
-------
PIG-4393 : Add stats and error reporting for Spark
After Pig submits a job to Spark cluster, we need to report job progress, spark
specific stats and any error logs back to the user.
This is an initial patch that adds spark specific stats, mostly to get feedback
around assumption that a separate Spark job is launched for each POStore
operator.
It also re-factors code to correctly populate PigStats, which is used by most
unit tests. This should fix a bunch of unit tests.
TODO items:
- Probably need to add counters to capture number of records, bytes in output
file to populate OutputStats.
- Though StatsReportListener prints spark job progress in the logs, we also
probably need to implement PigProgressNotificationListener for spark.
Diffs (updated)
-----
src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java
PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java
db152b5003ce6e79b001b2624010b91cc0f921d8
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java
b15994d525250bbb26f7b7126dae619b9da363c8
src/org/apache/pig/tools/pigstats/SparkStats.java
fd45dd4f0be415dd48d9fb7381c57c861bbbf7ce
src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkPigStats.java PRE-CREATION
src/org/apache/pig/tools/pigstats/spark/SparkStatsUtil.java PRE-CREATION
Diff: https://reviews.apache.org/r/30262/diff/
Testing
-------
Thanks,
Mohit Sabharwal