[
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264639#comment-14264639
]
Aihua Xu commented on HIVE-9228:
--------------------------------
[~ashutoshc] Are you working on this area, any idea?
The following window query throws ArrayOutOfBoundExcption.
select st_fips_cd, zip_cd_5, hh_surr_key
from
(
select st_fips_cd, zip_cd_5, hh_surr_key,
count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by
st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc)
as analytic_row_number3
from hh_agg
where analytic_row_number2 = 1
) t;
Here is the explain extend output for the query. At the File Sink Operator of
stage 1 below, seems like it should only output 4 columns while the temp table
in fact output one additional column (the value of CNT_ADVTG_LEN_RSDNC_CD_1).
I’m investigating toward such mismatch, but anyone can confirm and provide
additional info that will be helpful.
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: hh_agg
Statistics: Num rows: 33208 Data size: 10361206 Basic stats:
COMPLETE Column stats: NONE
GatherStats: false
Filter Operator
isSamplingPred: false
predicate: (analytic_row_number2 = 1) (type: boolean)
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: st_fips_cd (type: string), zip_cd_5 (type:
string), st_fips_cd (type: string), zip_cd_5 (type: string)
sort order: ++++
Map-reduce partition columns: st_fips_cd (type: string),
zip_cd_5 (type: string)
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
tag: -1
value expressions: st_fips_cd (type: string), zip_cd_5 (type:
string), hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string)
Path -> Alias:
file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
Path -> Partition:
file:/Users/axu/Documents/localDB/23982_debug
Partition
base file name: 23982_debug
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE true
EXTERNAL TRUE
bucket_count -1
columns
st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
columns.comments
columns.types
string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
field.delim ,
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/Users/axu/Documents/localDB/23982_debug
name default.hh_agg
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct hh_agg { string st_fips_cd, string
zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte
hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string
dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string
advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double
zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64
cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64
cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64
cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64
cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64
cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64
cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd,
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double
zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double
zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band,
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double
zip_mdn_inc_prodc_asset_cd, double zip_avg_net_wrth_cd, double
zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd, i32
analytic_row_number2}
serialization.format ,
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1419367415
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
COLUMN_STATS_ACCURATE true
EXTERNAL TRUE
bucket_count -1
columns
st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
columns.comments
columns.types
string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
field.delim ,
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/Users/axu/Documents/localDB/23982_debug
name default.hh_agg
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct hh_agg { string st_fips_cd, string
zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte
hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string
dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string
advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double
zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64
cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64
cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64
cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64
cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64
cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64
cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd,
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double
zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double
zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band,
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double
zip_mdn_inc_prodc_asset_cd, double zip_avg_net_wrth_cd, double
zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd, i32
analytic_row_number2}
serialization.format ,
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1419367415
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.hh_agg
name: default.hh_agg
Truncated Path -> Alias:
file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
Needs Tagging: false
Reduce Operator Tree:
Extract
Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE
Column stats: NONE
PTF Operator
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
GlobalTableId: 0
directory:
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
NumFilesPerFileSink: 1
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
properties:
columns _col0,_col1,_col2,_col6
columns.types string,string,bigint,string
escape.delim \
serialization.lib
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
GatherStats: false
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string),
_col2 (type: bigint)
sort order: +++
Map-reduce partition columns: _col0 (type: string), _col1 (type:
string)
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
tag: -1
value expressions: _col0 (type: string), _col1 (type: string),
_col2 (type: bigint)
Path -> Alias:
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
Path -> Partition:
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
Partition
base file name: -mr-10003
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
properties:
columns _col0,_col1,_col2,_col6
columns.types string,string,bigint,string
escape.delim \
serialization.lib
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
properties:
columns _col0,_col1,_col2,_col6
columns.types string,string,bigint,string
escape.delim \
serialization.lib
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Truncated Path -> Alias:
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
Needs Tagging: false
Reduce Operator Tree:
Extract
Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE
Column stats: NONE
PTF Operator
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col1 (type: string), _col2 (type: string), _col3
(type: bigint)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
GlobalTableId: 0
directory:
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001
NumFilesPerFileSink: 1
Statistics: Num rows: 16604 Data size: 5180603 Basic stats:
COMPLETE Column stats: NONE
Stats Publishing Key Prefix:
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001/
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
columns _col0,_col1,_col2
columns.types string:string:bigint
escape.delim \
hive.serialization.extend.nesting.levels true
serialization.format 1
serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TotalFiles: 1
GatherStats: false
MultiFileSpray: false
Stage: Stage-0
Fetch Operator
limit: -1
> Problem with subquery using windowing functions
> -----------------------------------------------
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.13.1
> Reporter: Aihua Xu
> Assignee: Aihua Xu
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works
> fine.
> select st_fips_cd, zip_cd_5, hh_surr_key
> from
> (
> select st_fips_cd, zip_cd_5, hh_surr_key,
> count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by
> st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
> row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key
> asc) as analytic_row_number3
> from hh_agg
> where analytic_row_number2 = 1
> ) t;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)