Thanks.

What are your thoughts on parallel aggregation? Generating query plans that
allow states to be generated which can be executed independently and then
states recombined?
On 16 Jun 2015 05:25, "Jihoon Son" <[email protected]> wrote:

> Hi Atri, thanks for your question.
>
> First of all, maybe you already did, I recommend that you read this article
> <
> http://www.hadoopsphere.com/2015/02/technical-deep-dive-into-apache-tajo.html
> >
> before you start implementation. This is written by Hyunsik, and contains
> the description of Tajo's overall infrastructure. Afterwards, I think that
> you may ask more detailed question.
>
> Here, I'll roughly list some important classes for aggregate
> implementation.
>
>    - SQLParser.g4 contains our SQL parsing rules. It is written in antlr.
>    - SQLAnalyzer is our parser based on rules defined at SQLParser.g4.
>    - SQLAnalyzer translates a SQL query into a tree of Expr which
>    represents an algebraic expression.
>    - LogicalPlanner translates the Expr tree into a LogicalPlan that
>    logically describes how the given query will be executed.
>    - GlobalPlanner translates the LogicalPlan into a MasterPlan
>    (distributed query execution plan) that describes how the given query
> will
>    be executed in distributed cluster.
>    - Once a MasterPlan is created, QueryMaster starts to execute query
>    processing. A query consists of multiple stages, which are individually
>    processed in some order.
>       - For example, a simple aggregation query is executed in two stages,
>       each of which is for parallel aggregation and combining aggregates.
> These
>       stages are executed sequentially.
>    - A stage is concurrently processed by multiple tasks, and is executed
>    by TajoWorker.
>    - Each task contains meta information for input data and a LogicalPlan
>    of the stage. This LogicalPlan is translated into PhysicalExec by
>    PhysicalPlanner.
>    - PhysicalExec describes how the query is actually executed.
>       - For example, there are two types of AggregationExec,
>       i.e., HashAggregateExec and SortAggregateExec, for hash-based
> aggregation
>       and sort-based aggregation, respectively.
>
> Best regards,
> Jihoon
>
> 2015년 6월 15일 (월) 오후 11:32, Atri Sharma <[email protected]>님이 작성:
>
> > Folks,
> >
> > I am looking into parallel aggregates/combining aggregates. I have a plan
> > around it which I think can work.
> >
> > Please update me on current infrastructure and point me around the
> existing
> > code base. Also, ideas would be most welcome around it.
> >
> > --
> > Regards,
> >
> > Atri
> > *l'apprenant*
> >
>

Reply via email to