[ 
https://issues.apache.org/jira/browse/TEZ-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2684:
----------------------------------
    Attachment: TEZ-2684.2.patch

Updated patch
- Invoke updatePendingTasks() from ShuffleVM.initialize().  Also, don't 
update/init anything if tasks <=0, or tasks == pendingTasks.size
- Retain the precondition check (safety net if at all anything changes later) 
on stats in parsePartitionStats. If numTasks of current vertex is 0, then 
higher level vertex would not be sending any stats info. 
- TestVertexImpl creates sample DAG to verify the partial events being sent out.

> ShuffleVertexManager.parsePartitionStats throws IllegalStateException: Stats 
> should be initialized
> --------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2684
>                 URL: https://issues.apache.org/jira/browse/TEZ-2684
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>         Environment: Hive on Tez
>            Reporter: Wei Zheng
>            Assignee: Rajesh Balamohan
>         Attachments: 0803_02.patch, TEZ-2684.1.patch, TEZ-2684.2.patch, 
> dynamic_partition_pruning.q, hivelog.tar.gz
>
>
> When I run hive qfile test (attached) using TestMiniTezCliDriver. My WIP 
> patch is also attached for problem reproduction purpose, as well as hive.log.
> Here's the explain and backtrace I got from qfile output:
> {code}
> EXPLAIN select count(*) from srcpart join srcpart_date on (srcpart.ds = 
> srcpart_date.ds) where srcpart_date.`date` = '2008-04-08'
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>       DagName: wzheng_20150803161620_55c139de-c26c-467f-b592-7d4333053ac6:38
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   filterExpr: ds is not null (type: boolean)
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: ds (type: string)
>                     sort order: +
>                     Map-reduce partition columns: ds (type: string)
>                     Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart_date
>                   filterExpr: (ds is not null and (date = '2008-04-08')) 
> (type: boolean)
>                   Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
> Column stats: NONE
>                   Filter Operator
>                     predicate: (ds is not null and (date = '2008-04-08')) 
> (type: boolean)
>                     Statistics: Num rows: 1 Data size: 21 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: ds (type: string)
>                       sort order: +
>                       Map-reduce partition columns: ds (type: string)
>                       Statistics: Num rows: 1 Data size: 21 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: ds (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 21 Basic stats: 
> COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 1 Data size: 21 Basic stats: 
> COMPLETE Column stats: NONE
>                         Dynamic Partitioning Event Operator
>                           Target Input: srcpart
>                           Partition key expr: ds
>                           Statistics: Num rows: 1 Data size: 21 Basic stats: 
> COMPLETE Column stats: NONE
>                           Target column: ds
>                           Target Vertex: Map 1
>         Reducer 2
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 ds (type: string)
>                   1 ds (type: string)
>                 Statistics: Num rows: 2200 Data size: 23372 Basic stats: 
> COMPLETE Column stats: NONE
>                 Group By Operator
>                   aggregations: count()
>                   mode: hash
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: NONE
>                   Reduce Output Operator
>                     sort order:
>                     Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: NONE
>                     value expressions: _col0 (type: bigint)
>         Reducer 3
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: count(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: NONE
>                   table:
>                       input format: org.apache.hadoop.mapred.TextInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> PREHOOK: query: select count(*) from srcpart join srcpart_date on (srcpart.ds 
> = srcpart_date.ds) where srcpart_date.`date` = '2008-04-08'
> PREHOOK: type: QUERY
> PREHOOK: Input: default@srcpart
> PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
> PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
> PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
> PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
> PREHOOK: Input: default@srcpart_date
> PREHOOK: Output: 
> file:/Users/wzheng/bf/hive/itests/qtest/target/tmp/localscratchdir/93b335b5-3ced-4f4d-abdd-2fd5defd11e4/hive_2015-08-03_16-16-21_046_5066458626645110592-1/-mr-10001
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1438643776809_0001_8_02, 
> diagnostics=[Vertex vertex_1438643776809_0001_8_02 [Reducer 2] killed/failed 
> due to:AM_USERCODE_FAILURE, Exception in VertexManager, 
> vertex:vertex_1438643776809_0001_8_02 [Reducer 2], 
> java.lang.IllegalStateException: Stats should be initialized
>       at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
>       at 
> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.parsePartitionStats(ShuffleVertexManager.java:535)
>       at 
> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexManagerEventReceived(ShuffleVertexManager.java:575)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventReceived.invoke(VertexManager.java:602)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:643)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:638)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:638)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:627)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> ]
> Vertex killed, vertexName=Reducer 3, vertexId=vertex_1438643776809_0001_8_03, 
> diagnostics=[Vertex received Kill in INITED state., Vertex 
> vertex_1438643776809_0001_8_03 [Reducer 3] killed/failed due to:null]
> Vertex killed, vertexName=Map 1, vertexId=vertex_1438643776809_0001_8_01, 
> diagnostics=[Vertex received Kill in INITED state., Vertex 
> vertex_1438643776809_0001_8_01 [Map 1] killed/failed due to:null]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 2, vertexId=vertex_1438643776809_0001_8_02, diagnostics=[Vertex 
> vertex_1438643776809_0001_8_02 [Reducer 2] killed/failed due 
> to:AM_USERCODE_FAILURE, Exception in VertexManager, 
> vertex:vertex_1438643776809_0001_8_02 [Reducer 2], 
> java.lang.IllegalStateException: Stats should be initialized
>       at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
>       at 
> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.parsePartitionStats(ShuffleVertexManager.java:535)
>       at 
> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexManagerEventReceived(ShuffleVertexManager.java:575)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventReceived.invoke(VertexManager.java:602)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:643)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:638)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:638)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:627)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> ]Vertex killed, vertexName=Reducer 3, 
> vertexId=vertex_1438643776809_0001_8_03, diagnostics=[Vertex received Kill in 
> INITED state., Vertex vertex_1438643776809_0001_8_03 [Reducer 3] 
> killed/failed due to:null]Vertex killed, vertexName=Map 1, 
> vertexId=vertex_1438643776809_0001_8_01, diagnostics=[Vertex received Kill in 
> INITED state., Vertex vertex_1438643776809_0001_8_01 [Map 1] killed/failed 
> due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 
> killedVertices:2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to