Re: Parallel Aggregates

Atri Sharma Mon, 15 Jun 2015 23:28:32 -0700

Thanks.

What are your thoughts on parallel aggregation? Generating query plans that
allow states to be generated which can be executed independently and then
states recombined?
On 16 Jun 2015 05:25, "Jihoon Son" <[email protected]> wrote:


> Hi Atri, thanks for your question.
>
> First of all, maybe you already did, I recommend that you read this article
> <
> http://www.hadoopsphere.com/2015/02/technical-deep-dive-into-apache-tajo.html
> >
> before you start implementation. This is written by Hyunsik, and contains
> the description of Tajo's overall infrastructure. Afterwards, I think that
> you may ask more detailed question.
>
> Here, I'll roughly list some important classes for aggregate
> implementation.
>
>    - SQLParser.g4 contains our SQL parsing rules. It is written in antlr.
>    - SQLAnalyzer is our parser based on rules defined at SQLParser.g4.
>    - SQLAnalyzer translates a SQL query into a tree of Expr which
>    represents an algebraic expression.
>    - LogicalPlanner translates the Expr tree into a LogicalPlan that
>    logically describes how the given query will be executed.
>    - GlobalPlanner translates the LogicalPlan into a MasterPlan
>    (distributed query execution plan) that describes how the given query
> will
>    be executed in distributed cluster.
>    - Once a MasterPlan is created, QueryMaster starts to execute query
>    processing. A query consists of multiple stages, which are individually
>    processed in some order.
>       - For example, a simple aggregation query is executed in two stages,
>       each of which is for parallel aggregation and combining aggregates.
> These
>       stages are executed sequentially.
>    - A stage is concurrently processed by multiple tasks, and is executed
>    by TajoWorker.
>    - Each task contains meta information for input data and a LogicalPlan
>    of the stage. This LogicalPlan is translated into PhysicalExec by
>    PhysicalPlanner.
>    - PhysicalExec describes how the query is actually executed.
>       - For example, there are two types of AggregationExec,
>       i.e., HashAggregateExec and SortAggregateExec, for hash-based
> aggregation
>       and sort-based aggregation, respectively.
>
> Best regards,
> Jihoon
>
> 2015년 6월 15일 (월) 오후 11:32, Atri Sharma <[email protected]>님이 작성:
>
> > Folks,
> >
> > I am looking into parallel aggregates/combining aggregates. I have a plan
> > around it which I think can work.
> >
> > Please update me on current infrastructure and point me around the
> existing
> > code base. Also, ideas would be most welcome around it.
> >
> > --
> > Regards,
> >
> > Atri
> > *l'apprenant*
> >
>

Re: Parallel Aggregates

Reply via email to