[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264639#comment-14264639 ]
Aihua Xu commented on HIVE-9228: -------------------------------- [~ashutoshc] Are you working on this area, any idea? The following window query throws ArrayOutOfBoundExcption. select st_fips_cd, zip_cd_5, hh_surr_key from ( select st_fips_cd, zip_cd_5, hh_surr_key, count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1, row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) as analytic_row_number3 from hh_agg where analytic_row_number2 = 1 ) t; Here is the explain extend output for the query. At the File Sink Operator of stage 1 below, seems like it should only output 4 columns while the temp table in fact output one additional column (the value of CNT_ADVTG_LEN_RSDNC_CD_1). I’m investigating toward such mismatch, but anyone can confirm and provide additional info that will be helpful. STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: hh_agg Statistics: Num rows: 33208 Data size: 10361206 Basic stats: COMPLETE Column stats: NONE GatherStats: false Filter Operator isSamplingPred: false predicate: (analytic_row_number2 = 1) (type: boolean) Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: st_fips_cd (type: string), zip_cd_5 (type: string), st_fips_cd (type: string), zip_cd_5 (type: string) sort order: ++++ Map-reduce partition columns: st_fips_cd (type: string), zip_cd_5 (type: string) Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE tag: -1 value expressions: st_fips_cd (type: string), zip_cd_5 (type: string), hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string) Path -> Alias: file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg] Path -> Partition: file:/Users/axu/Documents/localDB/23982_debug Partition base file name: 23982_debug input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE true EXTERNAL TRUE bucket_count -1 columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2 columns.comments columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int field.delim , file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/Users/axu/Documents/localDB/23982_debug name default.hh_agg numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64 cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64 cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64 cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd, string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band, double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd, double zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd, i32 analytic_row_number2} serialization.format , serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 0 transient_lastDdlTime 1419367415 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE true EXTERNAL TRUE bucket_count -1 columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2 columns.comments columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int field.delim , file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/Users/axu/Documents/localDB/23982_debug name default.hh_agg numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64 cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64 cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64 cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd, string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band, double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd, double zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd, i32 analytic_row_number2} serialization.format , serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe totalSize 0 transient_lastDdlTime 1419367415 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.hh_agg name: default.hh_agg Truncated Path -> Alias: file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg] Needs Tagging: false Reduce Operator Tree: Extract Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE PTF Operator Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false GlobalTableId: 0 directory: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 NumFilesPerFileSink: 1 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: columns _col0,_col1,_col2,_col6 columns.types string,string,bigint,string escape.delim \ serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe TotalFiles: 1 GatherStats: false MultiFileSpray: false Stage: Stage-2 Map Reduce Map Operator Tree: TableScan GatherStats: false Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: bigint) sort order: +++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string) Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE tag: -1 value expressions: _col0 (type: string), _col1 (type: string), _col2 (type: bigint) Path -> Alias: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 [file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003] Path -> Partition: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 Partition base file name: -mr-10003 input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: columns _col0,_col1,_col2,_col6 columns.types string,string,bigint,string escape.delim \ serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat properties: columns _col0,_col1,_col2,_col6 columns.types string,string,bigint,string escape.delim \ serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Truncated Path -> Alias: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003 [file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003] Needs Tagging: false Reduce Operator Tree: Extract Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE PTF Operator Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col1 (type: string), _col2 (type: string), _col3 (type: bigint) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false GlobalTableId: 0 directory: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001 NumFilesPerFileSink: 1 Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE Stats Publishing Key Prefix: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001/ table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: columns _col0,_col1,_col2 columns.types string:string:bigint escape.delim \ hive.serialization.extend.nesting.levels true serialization.format 1 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe TotalFiles: 1 GatherStats: false MultiFileSpray: false Stage: Stage-0 Fetch Operator limit: -1 > Problem with subquery using windowing functions > ----------------------------------------------- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.13.1 > Reporter: Aihua Xu > Assignee: Aihua Xu > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select st_fips_cd, zip_cd_5, hh_surr_key > from > ( > select st_fips_cd, zip_cd_5, hh_surr_key, > count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by > st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1, > row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key > asc) as analytic_row_number3 > from hh_agg > where analytic_row_number2 = 1 > ) t; -- This message was sent by Atlassian JIRA (v6.3.4#6332)