in hive ,stage means task.there are various tasks,such as
ddltask,fetchtask,mrtask etc.a hive query  plan is some DAGs,when execute
the plan,hive picks one of the task node in each DAG to start.so root task
is the enter-task of each DAG.
Fetch task is a task fetch query results. Show stmts,select stmts without
insert clause must have fetch task.
在 2011-3-17 下午5:20,"Joerg Schad" <20seco...@web.de>写道:
>
> Hi,
> when exploring the Hive Explain statement we were wondering about the
different stages.
> So here two questions regarding the below Explain statement
> 1. Why are there two root stages? What exactly does root stage mean (i
assume it meanst there are no predecessors)?
> 2. What exactly is a Fetch Stage? Is it an actual MapReduce stage?
> 3. Where can I find additional information about these stages in general?
>
> Thanks a lot for your support
> JS
> P.S. This has already been posted to the user mailing list but from there
we unfortunately received no reply...
>
>
> hive> EXPLAIN SELECT l_orderkey, o_shippingpriority, sum(l_extendedprice)
FROM orders JOIN lineitem ON (lineitem.l_orderkey = orders.o_orderkey) GROUP
BY l_orderkey, o_shippingpriority;
> OK
> ABSTRACT SYNTAX TREE:
> (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF orders) (TOK_TABREF lineitem)
(= (. (TOK_TABLE_OR_COL lineitem) l_orderkey) (. (TOK_TABLE_OR_COL orders)
o_orderkey)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
(TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL l_orderkey)) (TOK_SELEXPR
(TOK_TABLE_OR_COL o_shippingpriority)) (TOK_SELEXPR (TOK_FUNCTION sum
(TOK_TABLE_OR_COL l_extendedprice)))) (TOK_GROUPBY (TOK_TABLE_OR_COL
l_orderkey) (TOK_TABLE_OR_COL o_shippingpriority))))
>
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1
> Stage-0 is a root stage
>
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> lineitem
> TableScan
> alias: lineitem
> Reduce Output Operator
> key expressions:
> expr: l_orderkey
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: l_orderkey
> type: int
> tag: 1
> value expressions:
> expr: l_orderkey
> type: int
> expr: l_extendedprice
> type: int
> orders
> TableScan
> alias: orders
> Reduce Output Operator
> key expressions:
> expr: o_orderkey
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: o_orderkey
> type: int
> tag: 0
> value expressions:
> expr: o_shippingpriority
> type: int
> Reduce Operator Tree:
> Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {VALUE._col1}
> 1 {VALUE._col0} {VALUE._col1}
> handleSkewJoin: false
> outputColumnNames: _col1, _col3, _col4
> Select Operator
> expressions:
> expr: _col3
> type: int
> expr: _col1
> type: int
> expr: _col4
> type: int
> outputColumnNames: _col3, _col1, _col4
> Group By Operator
> aggregations:
> expr: sum(_col4)
> bucketGroup: false
> keys:
> expr: _col3
> type: int
> expr: _col1
> type: int
> mode: hash
> outputColumnNames: _col0, _col1, _col2
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>
> Stage: Stage-2
> Map Reduce
> Alias -> Map Operator Tree:
>
hdfs://localhost:9000/tmp/hive-joergschad/hive_2011-03-14_17-22-14_249_1239673786236436657/10002

> Reduce Output Operator
> key expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> sort order: ++
> Map-reduce partition columns:
> expr: _col0
> type: int
> expr: _col1
> type: int
> tag: -1
> value expressions:
> expr: _col2
> type: bigint
> Reduce Operator Tree:
> Group By Operator
> aggregations:
> expr: sum(VALUE._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: int
> expr: KEY._col1
> type: int
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> expr: _col2
> type: bigint
> outputColumnNames: _col0, _col1, _col2
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Stage: Stage-0
> Fetch Operator
> limit: -1

Reply via email to