Re: pig speed in local mode

Dmitriy Ryaboy Thu, 30 Sep 2010 13:34:44 -0700

Konstantin,
We explicitly chose to take out the thing that made Pig fast (faster,
anyway) in local mode because it exercised a different code path than
Hadoop-mode execution, which led to odd bugs and inconsistencies.

That being said, I think that with some work we could make local run a bit
faster, by pre-initializing the local hadoop threads and recycling them
between executions. This would be future work, though.. right now there is
no alternative.

You could of course downgrade to Pig 0.6, which is the last version to have
the fast local mode implementation. But then you have to watch out for the
aforementioned issues with unexpected differences vs hadoop mode.

-Dmitriy

On Thu, Sep 30, 2010 at 1:08 PM, Konstantin Ignatyev
<kgignat...@gmail.com>wrote:

> Hi,
>
> I am trying to write pig script that is quite complex so I am testing it
> against very small data subset in local mode.
> However it might take up to 2 _minutes_ to finish. Or 30 seconds if I
> execute only parts of it.
>
> That is quite annoying to say the least because SQL that I am trying to
> reimplement in pig works on the source dataset for 3 seconds only.
>
> Is there a way to improve PIG's speed in the local/development mode?
>
>
> Thanks
> --
> Konstantin Ignatyev
>
> PS: If this is a typical day on planet earth, humans will add fifteen
> million tons of carbon to the atmosphere, destroy 115 square miles of
> tropical rainforest, create seventy-two miles of desert, eliminate between
> forty to one hundred species, erode seventy-one million tons of topsoil,
> add
> 2,700 tons of CFCs to the stratosphere, and increase their population by
> 263,000
>
> Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a
> Strategy for Reforming Universities and Public Schools. New York: State
> University of New York Press, 1997: (4) (5) (p.206)
>

Re: pig speed in local mode

Reply via email to