pig-user  

Re: MapReduceLauncher static fields

Benjamin Reed
Mon, 07 Apr 2008 08:40:26 -0700

Your approach is one way of doing it; perhaps it's the best way. Another 
potential way is to pass a progress object to the store method.

In MapReduceLauncher.launchPig is the code for displaying the progress to the 
user. It's not perfect, but it does give a reasonable idea even in the 
presence of multiple jobs to complete a set of queries. Perhaps you could 
incorporate that into your getProgress().

thanx
ben

On Friday 04 April 2008 12:08:52 Michael Harris wrote:
> Ben,
>
> Thanks for getting back to me. Ideally the stats would be attached to a
> dump/store command. I tried to hack together a solution by making those
> fields non-static, making MapReduceLauncher serializable, adding method
> to MapReduceLauncher instances to getProgress, and modifying update
> points to use particular instances of MapReduceLauncher rather than the
> static calls it was doing before. Then I modified pig server to have a
> method getProgress(String alias) :
>
>       public double getProgress(String id) {
>               ExecutionEngine ee = pigContext.getExecutionEngine();
>               if (ee instanceof HExecutionEngine) {
>                       HExecutionEngine he = (HExecutionEngine) ee;
>                       LogicalPlan lp = aliases.get(id);
>                       POMapreduce mapRed = (POMapreduce)
> he.getPhysicalOpTable().get(
>
> he.getPhysicalKey(lp.getRoot()));
>                       return
> mapRed.getMapReduceLauncher().getProgress();
>               }
>               return -1;
>       }
>
> I have only spent a few hours with the Pig code so im not sure this is
> even correct, but it seems to work really well except the case when a
> set of queries uses a set of jobs to complete: the results are totally
> inaccurate until it gets to the final job. Its not really a big deal its
> just an internal tool, my users can live with no status updates, but I
> thought it would be a nice touch. I have looked at the roadmap for Pig
> and see that querying for progress is on there, I just wanted to make
> sure you guys think of my scenario (thread-safe, end user facing
> application) when you add it :)
>
> -Michael
>
> -----Original Message-----
> From: Benjamin Reed [EMAIL PROTECTED]
> Sent: Friday, April 04, 2008 11:29 AM
> To: pig-user@incubator.apache.org
> Subject: Re: MapReduceLauncher static fields
>
> The statistics are not updated in a thread safe way. They are global
> statistics, so they will be across jobs, and since they aren't thread
> safe they may be wrong. Other than the numbers I think that the rest
> should be thread safe assuming that the underlying Hadoop code is thread
> safe, which it looks to be.
>
> I would think for your application the stats should really be attached
> to an object that represents the store or dump method object right? (Or
> at least accessible through that object.)
>
> ben
>
> Michael Harris wrote:
> > Hello,
> >
> >
> >
> > I have written a pig application that does a fixed set of queries
> > on-demand through a web interface. I am trying to get the progress of
> > the queries from the PigServer, but I have noticed that the source of
> > the progress data is all static fields in the MapReduceLauncher.
>
> Clearly
>
> > my webapp must be able to handle multiple concurrent pig queries (and
>
> be
>
> > thread-safe) and I would like to report the progress of each
>
> individual
>
> > query (job set) to the end user.  Do these static fields indicate that
>
> I
>
> > would get the progress of multiple concurrent queries initiated by
> > different PigServer instances? or would I get the overall progress of
> > the MapReduceLauncher for all queries currently being executed?
> >
> >
> >
> > Thanks,
> > Michael