Apache Wiki
Mon, 12 May 2008 00:32:01 -0700
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The following page has been changed by Shravan Narayanamurthy: http://wiki.apache.org/pig/PigReporting New page: = Reporting in Pig = Hadoop has a notion that jobs that do not report progress might be stuck and will kill it after a timeout. To adopt this into Pig Map Reduce, where we exectue operator plans, we need to integrate this notion of reporting into Pig Operators. For supporting this, we want to propose a reporter interface that any backend in pig can use. The main method required by any reporter as of now is to report progress probably with a status msg. So the following should be enough for now. We can add other things on a need basis. {{{ public interface PigProgressable { //Use to just inform that you are //alive public void progress(); //If you have a status to report public void progress(String msg); } }}} == Changes to current code == In the mapReduceLayer, we have to implement this interface with a class that wraps the Hadoop reporter. To minimize changes to the code, I am planning to have a static variable in PhysicalOperator class which will be set by the map & reduce functions at the beginning. Changes to the currently implemented operators would involve adding a call to the progress method of this static variable as soon as it starts processing a tuple. For ex., in POFilter, the current code would be changed to: Current: {{{ while (true) { inp = processInput(); if (inp.returnStatus == POStatus.STATUS_EOP || inp.returnStatus == POStatus.STATUS_ERR) break; ... }}} Changed to: {{{ while (true) { inp = processInput(); PhysicalOperator.reporter.progress(); if (inp.returnStatus == POStatus.STATUS_EOP || inp.returnStatus == POStatus.STATUS_ERR) break; ... }}}