I don't know the answer to those questions yet. IIRC the para timer
algorithms also only worked when the join was n-1, not in the n-m
case. For a first pass I suspect we'll take what they have, and for
jobs with a join in them we'll use our old style of progress
indication. We can then grow the join use of paratimer as we have
available statistics. Hopefully as we integrate with systems like
Howl we'll get this statistics information we need, at least in some
cases.
Alan.
On Feb 17, 2011, at 9:14 AM, Renato Marroquín Mogrovejo wrote:
Hi Laukik,
I did see their work, but I guess I asked a question too vague, my
bad (:
I asked about Parallax because in that work they strongly emphasize
that
they had it implemented in Pig, so I wondered, but in the case of
ParaTimer
I kinda assumed they didn't open-sourced. It probably is just the
way they
described them both.
And going back to the original question, I mean the main difference
between
Parallax and ParaTimer is that the latter one is able to make progress
estimates for tasks that run in parallel like joins which have two
or more
inputs. But I was actually referring to cardinality estimates for
pre-defined operators. They use the same approach in both works.
They use
regular query optimization techniques, like cost formulas. However,
the use
of these formulas requires to have a-priori knowledge of the data,
or to get
these estimates while in their debug runs. Would this be the best
way to do
it while leveraging Pig?
Thanks in advance.
Renato M.
2011/2/16 Laukik Chitnis <[email protected]>
Hi Renato,
They extended Parallax to ParaTimer -- progress estimator for MR DAGs
(which covers the join case). Here is the paper that talks about it:
http://www.cs.washington.edu/homes/kmorton/camera-ready.pdf
Cheers,
Laukik
On 2/16/11 11:59 AM, "Renato Marroquín Mogrovejo" <
[email protected]> wrote:
I see, I thought it was already part of Pig internals, but oh well,
we will
get there (:
There was one thing that made me think about that work, and it was
the join
case. I mean they say that the progress estimator woudn't deal with
joins.
What do you think it would be a good approach for a join progress
estimator?
Any ideas are more than welcome. Thanks in advance!
Renato M.
2011/2/15 Alan Gates <[email protected]>
Parallax is implemented on a pull from Pig trunk around the 0.2
timeframe.
However, we are working with Kristi to get it integrated into
current Pig.
With some luck, we'll get it integrated into 0.9 (but no promises).
Alan.
On Feb 15, 2011, at 7:54 AM, Renato Marroquín Mogrovejo wrote:
Hi, I wanted to know if "Parallax" [1] is implemented on Pig0.8 and
how it
is being used.
Any hints about its classes would be awesome.
Thanks in advance.
Renato M.
[1] *Toward A Progress Indicator for Parallel Queries. *
ftp://ftp.cs.washington.edu/tr/2009/07/UW-CSE-09-07-01.PDF