I don't know the answer to those questions yet. IIRC the para timer algorithms also only worked when the join was n-1, not in the n-m case. For a first pass I suspect we'll take what they have, and for jobs with a join in them we'll use our old style of progress indication. We can then grow the join use of paratimer as we have available statistics. Hopefully as we integrate with systems like Howl we'll get this statistics information we need, at least in some cases.

Alan.

On Feb 17, 2011, at 9:14 AM, Renato Marroquín Mogrovejo wrote:

Hi Laukik,

I did see their work, but I guess I asked a question too vague, my bad (: I asked about Parallax because in that work they strongly emphasize that they had it implemented in Pig, so I wondered, but in the case of ParaTimer I kinda assumed they didn't open-sourced. It probably is just the way they
described them both.
And going back to the original question, I mean the main difference between
Parallax and ParaTimer is that the latter one is able to make progress
estimates for tasks that run in parallel like joins which have two or more
inputs. But I was actually referring to cardinality estimates for
pre-defined operators. They use the same approach in both works. They use regular query optimization techniques, like cost formulas. However, the use of these formulas requires to have a-priori knowledge of the data, or to get these estimates while in their debug runs. Would this be the best way to do
it while leveraging Pig?
Thanks in advance.


Renato M.



2011/2/16 Laukik Chitnis <[email protected]>

Hi Renato,

They extended Parallax to ParaTimer -- progress estimator for MR DAGs
(which covers the join case). Here is the paper that talks about it:
http://www.cs.washington.edu/homes/kmorton/camera-ready.pdf

Cheers,
Laukik



On 2/16/11 11:59 AM, "Renato Marroquín Mogrovejo" <
[email protected]> wrote:

I see, I thought it was already part of Pig internals, but oh well, we will
get there (:
There was one thing that made me think about that work, and it was the join case. I mean they say that the progress estimator woudn't deal with joins. What do you think it would be a good approach for a join progress estimator?
Any ideas are more than welcome. Thanks in advance!

Renato M.


2011/2/15 Alan Gates <[email protected]>

Parallax is implemented on a pull from Pig trunk around the 0.2 timeframe. However, we are working with Kristi to get it integrated into current Pig.
With some luck, we'll get it integrated into 0.9 (but no promises).

Alan.


On Feb 15, 2011, at 7:54 AM, Renato Marroquín Mogrovejo wrote:

Hi, I wanted to know if "Parallax" [1] is implemented on Pig0.8 and how it
is being used.
Any hints about its classes would be awesome.
Thanks in advance.

Renato M.


[1] *Toward A Progress Indicator for Parallel Queries. *
ftp://ftp.cs.washington.edu/tr/2009/07/UW-CSE-09-07-01.PDF







Reply via email to