-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45365/
-----------------------------------------------------------
Review request for pig, Pallavi Rao and Rohini Palaniswamy.
Repository: pig-git
Description
-------
Pig scripts can have multiple ETL jobs in the DAG which may take hours to
finish. In case of transient errors, the job fails. When the job is rerun, all
the nodes in Job graph will rerun. Some of these nodes may have already run
successfully. Redundant runs lead to wastage of cluster capacity and pipeline
delays.
In case of failure, we can persist the graph state. In next run, only the
failed nodes and their successors will rerun. This is of course subject to
preconditions such as
> Pig script has not changed
> Input locations have not changed
> Output data from previous run is intact
> Configuration has not changed
This patch is moving forward the work done in
https://reviews.apache.org/r/39226/
Diffs
-----
conf/pig.properties ee9ae6d
src/org/apache/pig/Main.java 0f84ffc
src/org/apache/pig/PigConfiguration.java 14bec5a
src/org/apache/pig/PigServer.java ee52472
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
595e68c
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRIntermediateDataVisitor.java
4b62112
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobRecovery.java
e69de29
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobState.java
e69de29
src/org/apache/pig/impl/PigImplConstants.java 050a243
test/org/apache/pig/test/TestRecover.java e69de29
Diff: https://reviews.apache.org/r/45365/diff/
Testing
-------
A new test case has been added in patch. TestRecover
Thanks,
prateek vaishnav