-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45365/
-----------------------------------------------------------

Review request for pig, Pallavi Rao and Rohini Palaniswamy.


Repository: pig-git


Description
-------

Pig scripts can have multiple ETL jobs in the DAG which may take hours to 
finish. In case of transient errors, the job fails. When the job is rerun, all 
the nodes in Job graph will rerun. Some of these nodes may have already run 
successfully. Redundant runs lead to wastage of cluster capacity and pipeline 
delays.

In case of failure, we can persist the graph state. In next run, only the 
failed nodes and their successors will rerun. This is of course subject to 
preconditions such as
         > Pig script has not changed
         > Input locations have not changed
         > Output data from previous run is intact
         > Configuration has not changed
         
This patch is moving forward the work done in  
https://reviews.apache.org/r/39226/


Diffs
-----

  conf/pig.properties ee9ae6d 
  src/org/apache/pig/Main.java 0f84ffc 
  src/org/apache/pig/PigConfiguration.java 14bec5a 
  src/org/apache/pig/PigServer.java ee52472 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
 595e68c 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRIntermediateDataVisitor.java
 4b62112 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobRecovery.java
 e69de29 
  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobState.java
 e69de29 
  src/org/apache/pig/impl/PigImplConstants.java 050a243 
  test/org/apache/pig/test/TestRecover.java e69de29 

Diff: https://reviews.apache.org/r/45365/diff/


Testing
-------

A new test case has been added in patch. TestRecover


Thanks,

prateek vaishnav

Reply via email to