Thanks. What are your thoughts on parallel aggregation? Generating query plans that allow states to be generated which can be executed independently and then states recombined? On 16 Jun 2015 05:25, "Jihoon Son" <[email protected]> wrote:
> Hi Atri, thanks for your question. > > First of all, maybe you already did, I recommend that you read this article > < > http://www.hadoopsphere.com/2015/02/technical-deep-dive-into-apache-tajo.html > > > before you start implementation. This is written by Hyunsik, and contains > the description of Tajo's overall infrastructure. Afterwards, I think that > you may ask more detailed question. > > Here, I'll roughly list some important classes for aggregate > implementation. > > - SQLParser.g4 contains our SQL parsing rules. It is written in antlr. > - SQLAnalyzer is our parser based on rules defined at SQLParser.g4. > - SQLAnalyzer translates a SQL query into a tree of Expr which > represents an algebraic expression. > - LogicalPlanner translates the Expr tree into a LogicalPlan that > logically describes how the given query will be executed. > - GlobalPlanner translates the LogicalPlan into a MasterPlan > (distributed query execution plan) that describes how the given query > will > be executed in distributed cluster. > - Once a MasterPlan is created, QueryMaster starts to execute query > processing. A query consists of multiple stages, which are individually > processed in some order. > - For example, a simple aggregation query is executed in two stages, > each of which is for parallel aggregation and combining aggregates. > These > stages are executed sequentially. > - A stage is concurrently processed by multiple tasks, and is executed > by TajoWorker. > - Each task contains meta information for input data and a LogicalPlan > of the stage. This LogicalPlan is translated into PhysicalExec by > PhysicalPlanner. > - PhysicalExec describes how the query is actually executed. > - For example, there are two types of AggregationExec, > i.e., HashAggregateExec and SortAggregateExec, for hash-based > aggregation > and sort-based aggregation, respectively. > > Best regards, > Jihoon > > 2015년 6월 15일 (월) 오후 11:32, Atri Sharma <[email protected]>님이 작성: > > > Folks, > > > > I am looking into parallel aggregates/combining aggregates. I have a plan > > around it which I think can work. > > > > Please update me on current infrastructure and point me around the > existing > > code base. Also, ideas would be most welcome around it. > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > >
