[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264639#comment-14264639
 ] 

Aihua Xu commented on HIVE-9228:
--------------------------------

[~ashutoshc] Are you working on this area, any idea?

The following window query throws ArrayOutOfBoundExcption. 

select st_fips_cd, zip_cd_5, hh_surr_key
from
(
select st_fips_cd, zip_cd_5, hh_surr_key,
count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by 
st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) 
as analytic_row_number3
from hh_agg
where analytic_row_number2 = 1
) t;


Here is the explain extend output for the query.  At the File Sink Operator of 
stage 1 below, seems like it should only output 4 columns while the temp table 
in fact output one additional column (the value of CNT_ADVTG_LEN_RSDNC_CD_1). 
I’m investigating toward such mismatch, but anyone can confirm and provide 
additional info that will be helpful.

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: hh_agg
            Statistics: Num rows: 33208 Data size: 10361206 Basic stats: 
COMPLETE Column stats: NONE
            GatherStats: false
            Filter Operator
              isSamplingPred: false
              predicate: (analytic_row_number2 = 1) (type: boolean)
              Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: st_fips_cd (type: string), zip_cd_5 (type: 
string), st_fips_cd (type: string), zip_cd_5 (type: string)
                sort order: ++++
                Map-reduce partition columns: st_fips_cd (type: string), 
zip_cd_5 (type: string)
                Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
                tag: -1
                value expressions: st_fips_cd (type: string), zip_cd_5 (type: 
string), hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string)
      Path -> Alias:
        file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
      Path -> Partition:
        file:/Users/axu/Documents/localDB/23982_debug 
          Partition
            base file name: 23982_debug
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            properties:
              COLUMN_STATS_ACCURATE true
              EXTERNAL TRUE
              bucket_count -1
              columns 
st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
              columns.comments 
              columns.types 
string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
              field.delim ,
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              location file:/Users/axu/Documents/localDB/23982_debug
              name default.hh_agg
              numFiles 0
              numRows 0
              rawDataSize 0
              serialization.ddl struct hh_agg { string st_fips_cd, string 
zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte 
hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string 
dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string 
advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double 
zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 
cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64 
cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64 
cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 
cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 
cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64 
cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd, 
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double 
zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double 
zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band, 
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double 
zip_mdn_inc_prodc_asset_cd, double zip_avg_net_wrth_cd, double 
zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd, i32 
analytic_row_number2}
              serialization.format ,
              serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 0
              transient_lastDdlTime 1419367415
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                COLUMN_STATS_ACCURATE true
                EXTERNAL TRUE
                bucket_count -1
                columns 
st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
                columns.comments 
                columns.types 
string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
                field.delim ,
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                location file:/Users/axu/Documents/localDB/23982_debug
                name default.hh_agg
                numFiles 0
                numRows 0
                rawDataSize 0
                serialization.ddl struct hh_agg { string st_fips_cd, string 
zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte 
hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string 
dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string 
advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double 
zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 
cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advtg_marital_stat_cd_1, i64 
cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk, i64 
cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 
cnt_nbr_tradeline_3, i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 
cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7, i64 cnt_nbr_tradeline_8, i64 
cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd, 
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double 
zip_mdn_prprty_mkt_val, double zip_avg_home_eqty_amt, double 
zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band, 
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double 
zip_mdn_inc_prodc_asset_cd, double zip_avg_net_wrth_cd, double 
zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd, i32 
analytic_row_number2}
                serialization.format ,
                serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                totalSize 0
                transient_lastDdlTime 1419367415
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.hh_agg
            name: default.hh_agg
      Truncated Path -> Alias:
        file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
      Needs Tagging: false
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE 
Column stats: NONE
          PTF Operator
            Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              GlobalTableId: 0
              directory: 
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
              NumFilesPerFileSink: 1
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  properties:
                    columns _col0,_col1,_col2,_col6
                    columns.types string,string,bigint,string
                    escape.delim \
                    serialization.lib 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
                  serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
              TotalFiles: 1
              GatherStats: false
              MultiFileSpray: false

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            GatherStats: false
            Reduce Output Operator
              key expressions: _col0 (type: string), _col1 (type: string), 
_col2 (type: bigint)
              sort order: +++
              Map-reduce partition columns: _col0 (type: string), _col1 (type: 
string)
              Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
              tag: -1
              value expressions: _col0 (type: string), _col1 (type: string), 
_col2 (type: bigint)
      Path -> Alias:
        
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
 
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
      Path -> Partition:
        
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
 
          Partition
            base file name: -mr-10003
            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
            output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
            properties:
              columns _col0,_col1,_col2,_col6
              columns.types string,string,bigint,string
              escape.delim \
              serialization.lib 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
          
              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
              output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
              properties:
                columns _col0,_col1,_col2,_col6
                columns.types string,string,bigint,string
                escape.delim \
                serialization.lib 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
              serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Truncated Path -> Alias:
        
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
 
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
      Needs Tagging: false
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE 
Column stats: NONE
          PTF Operator
            Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
            Select Operator
              expressions: _col1 (type: string), _col2 (type: string), _col3 
(type: bigint)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                GlobalTableId: 0
                directory: 
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001
                NumFilesPerFileSink: 1
                Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
                Stats Publishing Key Prefix: 
file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001/
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      columns _col0,_col1,_col2
                      columns.types string:string:bigint
                      escape.delim \
                      hive.serialization.extend.nesting.levels true
                      serialization.format 1
                      serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                TotalFiles: 1
                GatherStats: false
                MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1

> Problem with subquery using windowing functions
> -----------------------------------------------
>
>                 Key: HIVE-9228
>                 URL: https://issues.apache.org/jira/browse/HIVE-9228
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.1
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select st_fips_cd, zip_cd_5, hh_surr_key
> from
> (
> select st_fips_cd, zip_cd_5, hh_surr_key,
> count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by 
> st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
> row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key 
> asc) as analytic_row_number3
> from hh_agg
> where analytic_row_number2 = 1
> ) t;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to