Mostafa Mokhtar created HIVE-10446:
--------------------------------------
Summary: Hybrid Hybrid Grace Hash Join :
java.lang.IllegalArgumentException in Kryo while spilling big table
Key: HIVE-10446
URL: https://issues.apache.org/jira/browse/HIVE-10446
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Wei Zheng
Fix For: 1.2.0
TPC-DS Q85 fails with Kryo exception when spilling big table data.
Query
{code}
select substr(r_reason_desc,1,20) as r
,avg(wr_return_ship_cost) wq
,avg(wr_refunded_cash) ref
,avg(wr_fee) fee
from web_returns, customer_demographics cd1,
customer_demographics cd2, customer_address, date_dim, reason
where
cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk
and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
and reason.r_reason_sk = web_returns.wr_reason_sk
and cd1.cd_marital_status = cd2.cd_marital_status
and cd1.cd_education_status = cd2.cd_education_status
group by r_reason_desc
order by r, wq, ref, fee
limit 100
{code}
Plan
{code}
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6
(BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
DagName: mmokhtar_20150422165209_d8eb5634-c19f-4576-9525-cad248c7ca37:5
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: web_returns
filterExpr: (((wr_refunded_addr_sk is not null and
wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and
wr_returning_cdemo_sk is not null) (type: boolean)
Statistics: Num rows: 2062802370 Data size: 185695406284
Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((wr_refunded_addr_sk is not null and
wr_reason_sk is not null) and wr_refunded_cdemo_sk is not null) and
wr_returning_cdemo_sk is not null) (type: boolean)
Statistics: Num rows: 1875154723 Data size: 51267313780
Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: wr_refunded_cdemo_sk (type: int),
wr_refunded_addr_sk (type: int), wr_returning_cdemo_sk (type: int),
wr_reason_sk (type: int), wr_fee (type: float), wr_return_ship_cost (type:
float), wr_refunded_cash (type: float)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6
Statistics: Num rows: 1875154723 Data size: 51267313780
Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col2, _col3, _col4, _col5,
_col6
input vertices:
1 Map 4
Statistics: Num rows: 1875154688 Data size: 45003712512
Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col3 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col2, _col4, _col5, _col6,
_col9
input vertices:
1 Map 5
Statistics: Num rows: 1875154688 Data size:
219393098496 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col2, _col4, _col5, _col6,
_col9, _col11, _col12
input vertices:
1 Map 6
Statistics: Num rows: 1875154688 Data size:
547545168896 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int), _col11 (type: string),
_col12 (type: string)
1 _col0 (type: int), _col1 (type: string),
_col2 (type: string)
outputColumnNames: _col4, _col5, _col6, _col9
input vertices:
1 Map 7
Statistics: Num rows: 402058172 Data size:
43824340748 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Select Operator
expressions: _col9 (type: string), _col5 (type:
float), _col6 (type: float), _col4 (type: float)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 402058172 Data size:
43824340748 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: avg(_col1), avg(_col2),
avg(_col3)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 10975 Data size:
1064575 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type:
string)
Statistics: Num rows: 10975 Data size:
1064575 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:float>), _col2 (type:
struct<count:bigint,sum:double,input:float>), _col3 (type:
struct<count:bigint,sum:double,input:float>)
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: customer_address
filterExpr: ca_address_sk is not null (type: boolean)
Statistics: Num rows: 40000000 Data size: 40595195284 Basic
stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ca_address_sk is not null (type: boolean)
Statistics: Num rows: 40000000 Data size: 160000000 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ca_address_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 40000000 Data size: 160000000 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 40000000 Data size: 160000000
Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 5
Map Operator Tree:
TableScan
alias: reason
filterExpr: r_reason_sk is not null (type: boolean)
Statistics: Num rows: 72 Data size: 14400 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: r_reason_sk is not null (type: boolean)
Statistics: Num rows: 72 Data size: 7272 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: r_reason_sk (type: int), r_reason_desc
(type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 72 Data size: 7272 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 72 Data size: 7272 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized
Map 6
Map Operator Tree:
TableScan
alias: cd1
filterExpr: ((cd_demo_sk is not null and cd_marital_status is
not null) and cd_education_status is not null) (type: boolean)
Statistics: Num rows: 1920800 Data size: 718379200 Basic
stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((cd_demo_sk is not null and cd_marital_status
is not null) and cd_education_status is not null) (type: boolean)
Statistics: Num rows: 1920800 Data size: 351506400 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cd_demo_sk (type: int), cd_marital_status
(type: string), cd_education_status (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1920800 Data size: 351506400 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1920800 Data size: 351506400
Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string), _col2 (type:
string)
Execution mode: vectorized
Map 7
Map Operator Tree:
TableScan
alias: cd1
filterExpr: ((cd_demo_sk is not null and cd_marital_status is
not null) and cd_education_status is not null) (type: boolean)
Statistics: Num rows: 1920800 Data size: 718379200 Basic
stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((cd_demo_sk is not null and cd_marital_status
is not null) and cd_education_status is not null) (type: boolean)
Statistics: Num rows: 1920800 Data size: 351506400 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cd_demo_sk (type: int), cd_marital_status
(type: string), cd_education_status (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1920800 Data size: 351506400 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type:
string), _col2 (type: string)
sort order: +++
Map-reduce partition columns: _col0 (type: int), _col1
(type: string), _col2 (type: string)
Statistics: Num rows: 1920800 Data size: 351506400
Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), avg(VALUE._col1),
avg(VALUE._col2)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 25 Data size: 3025 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: substr(_col0, 1, 20) (type: string), _col1
(type: double), _col2 (type: double), _col3 (type: double)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 25 Data size: 5200 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type:
double), _col2 (type: double), _col3 (type: double)
sort order: ++++
Statistics: Num rows: 25 Data size: 5200 Basic stats:
COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.04
Reducer 3
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: string),
KEY.reducesinkkey1 (type: double), KEY.reducesinkkey2 (type: double),
KEY.reducesinkkey3 (type: double)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 25 Data size: 5200 Basic stats: COMPLETE
Column stats: COMPLETE
Limit
Number of rows: 100
Statistics: Num rows: 25 Data size: 5200 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 25 Data size: 5200 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 100
Processor Tree:
ListSink
{code}
Exception
{code}
], TaskAttempt 3 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
exception: output cannot be null.
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:411)
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.process(VectorMapJoinOperator.java:287)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
... 18 more
Caused by: java.lang.IllegalArgumentException: output cannot be null.
at
org.apache.hive.com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:601)
at
org.apache.hadoop.hive.ql.exec.persistence.ObjectContainer.add(ObjectContainer.java:101)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.spillBigTableRow(MapJoinOperator.java:425)
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:307)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390)
... 27 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex
vertex_1426707664723_3652_3_04 [Map 1] killed/failed due to:null]Vertex killed,
vertexName=Reducer 3, vertexId=vertex_1426707664723_3652_3_06,
diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as
other vertex failed. failedTasks:0, Vertex vertex_1426707664723_3652_3_06
[Reducer 3] killed/failed due to:null]Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1426707664
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)