in hive ,stage means task.there are various tasks,such as ddltask,fetchtask,mrtask etc.a hive query plan is some DAGs,when execute the plan,hive picks one of the task node in each DAG to start.so root task is the enter-task of each DAG. Fetch task is a task fetch query results. Show stmts,select stmts without insert clause must have fetch task. 在 2011-3-17 下午5:20,"Joerg Schad" <20seco...@web.de>写道: > > Hi, > when exploring the Hive Explain statement we were wondering about the different stages. > So here two questions regarding the below Explain statement > 1. Why are there two root stages? What exactly does root stage mean (i assume it meanst there are no predecessors)? > 2. What exactly is a Fetch Stage? Is it an actual MapReduce stage? > 3. Where can I find additional information about these stages in general? > > Thanks a lot for your support > JS > P.S. This has already been posted to the user mailing list but from there we unfortunately received no reply... > > > hive> EXPLAIN SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice) FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP BY l_orderkey, o_shippingpriority; > OK > ABSTRACT SYNTAX TREE: > (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem) (= (. (TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders) o_orderkey)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR (TOK_TABLE_OR_COL o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL l_extendedprice)))) (TOK_GROUPBY (TOK_TABLE_OR_COL l_orderkey) (TOK_TABLE_OR_COL o_shippingpriority)))) > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1 > Stage-0 is a root stage > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > lineitem > TableScan > alias: lineitem > Reduce Output Operator > key expressions: > expr: l_orderkey > type: int > sort order: + > Map-reduce partition columns: > expr: l_orderkey > type: int > tag: 1 > value expressions: > expr: l_orderkey > type: int > expr: l_extendedprice > type: int > orders > TableScan > alias: orders > Reduce Output Operator > key expressions: > expr: o_orderkey > type: int > sort order: + > Map-reduce partition columns: > expr: o_orderkey > type: int > tag: 0 > value expressions: > expr: o_shippingpriority > type: int > Reduce Operator Tree: > Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {VALUE._col1} > 1 {VALUE._col0} {VALUE._col1} > handleSkewJoin: false > outputColumnNames: _col1, _col3, _col4 > Select Operator > expressions: > expr: _col3 > type: int > expr: _col1 > type: int > expr: _col4 > type: int > outputColumnNames: _col3, _col1, _col4 > Group By Operator > aggregations: > expr: sum(_col4) > bucketGroup: false > keys: > expr: _col3 > type: int > expr: _col1 > type: int > mode: hash > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > > Stage: Stage-2 > Map Reduce > Alias -> Map Operator Tree: > hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002
> Reduce Output Operator > key expressions: > expr: _col0 > type: int > expr: _col1 > type: int > sort order: ++ > Map-reduce partition columns: > expr: _col0 > type: int > expr: _col1 > type: int > tag: -1 > value expressions: > expr: _col2 > type: bigint > Reduce Operator Tree: > Group By Operator > aggregations: > expr: sum(VALUE._col0) > bucketGroup: false > keys: > expr: KEY._col0 > type: int > expr: KEY._col1 > type: int > mode: mergepartial > outputColumnNames: _col0, _col1, _col2 > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: int > expr: _col2 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > > Stage: Stage-0 > Fetch Operator > limit: -1