sundy_baba_zhouzy created HIVE-28849:
----------------------------------------

             Summary: nest order by   Exception:Caused by: 
java.lang.ArrayIndexOutOfBoundsException
                 Key: HIVE-28849
                 URL: https://issues.apache.org/jira/browse/HIVE-28849
             Project: Hive
          Issue Type: Bug
          Components: Operators, Query Planning
    Affects Versions: 2.3.10, 2.3.9, 2.3.7
         Environment: 我们用的 2.3.9,我测试了 2.3.10 和 2.3.7都有这个bug。 这个bug和 
执行引擎没有关系,因为生成的执行计划就是错误的 。我测试了3.1.4的版本已经没有这个问题了。
            Reporter: sundy_baba_zhouzy
             Fix For: 3.2.0, 3.1.4


复现步骤:

-- ----------------------------
-- Table structure for stg_judicial_judge_document
-- ----------------------------
DROP TABLE IF EXISTS `stg_judicial_judge_document`;
CREATE TABLE `stg_judicial_judge_document` (
`id` varchar(512) COMMENT '自增主键',
`document_id` varchar(512) COMMENT '裁判文书网文书id',
`defendant` varchar(512) COMMENT '被告',
`case_number` varchar(512) COMMENT '案号',
`case_type_code` varchar(512) COMMENT '案件类型编码(已废弃)',
`case_type` varchar(512) COMMENT '案件类型',
`document_type_code` varchar(512) COMMENT '文书类型编码(已废弃)',
`documnet_type` varchar(512) COMMENT '文书类型',
`province` varchar(512) COMMENT '省份',
`city` varchar(512) COMMENT '地市',
`county` varchar(512) COMMENT '区县',
`court_id` varchar(512) COMMENT '法院id',
`court_name` varchar(512) COMMENT '法院名称',
`trial_name` varchar(512) COMMENT '审理程序名称',
`judge_year` varchar(512) COMMENT '裁判年份',
`judge_date` varchar(512) COMMENT '裁判日期',
`case_reason` varchar(512) COMMENT '案由',
`document_title` varchar(512) COMMENT '文书标题',
`document_text` varchar(512) COMMENT '文书全文',
`announce_date` varchar(512) COMMENT '公布日期',
`data_source` varchar(512) COMMENT '数据来源(裁判文书网,北京法院网)',
`use_mark` varchar(512) COMMENT '使用标志0正常,1人工已修正,3未解析,9历史删除,10废弃数据',
`source_url` varchar(512) COMMENT '原始连接URL',
`in_store_time` varchar(512) COMMENT '入库时间',
`last_update_time` varchar(512) COMMENT '更新时间',
`plaintiff` varchar(512) COMMENT '原告',
`is_not_display` varchar(512) COMMENT '是否不展示(已废弃)'
);

-- ----------------------------
-- Table structure for stg_judge_document_client
-- ----------------------------
DROP TABLE IF EXISTS `stg_judge_document_client`;
CREATE TABLE `stg_judge_document_client` (
`id` varchar(512) COMMENT '自增主键',
`docid` varchar(512) COMMENT '主表docid',
`identity` varchar(512) COMMENT '当事人身份,原告,被告',
`identity_role_type` varchar(512) COMMENT '当事人角色类型,0原告,1被告,2其他',
`identity_type` varchar(512) COMMENT '当事人类型,0个人,1非个人',
`relate_name` varchar(512) COMMENT '相关人名称',
`document_publish_date` varchar(512) COMMENT '文书公布日期',
`case_reason` varchar(512) COMMENT '案由',
`use_mark` varchar(512) COMMENT '使用标志0正常,1人工已修正,9历史删除,10废弃数据',
`last_update_time` varchar(512) COMMENT '更新时间',
`is_not_display` varchar(512) COMMENT '是否不展示(已废弃)',
`reserve1` varchar(512) COMMENT '预留字段1',
`reserve2` varchar(512) COMMENT '预留字段2',
`ent_id` varchar(512) COMMENT '公司id,party_name是公司,则关联工商ic主键'
) ;


-- ----------------------------
-- Table structure for stg_business_baseinfo_clean
-- ----------------------------
DROP TABLE IF EXISTS `stg_business_baseinfo_clean`;

CREATE TABLE `stg_business_baseinfo_clean` (
`id` varchar(512) COMMENT '自增主键',
`id_entity` varchar(512) COMMENT '主体唯一键',
`register_status` varchar(512) COMMENT '归类后的登记状态',
`entity_type` varchar(512) COMMENT '归类后的主体类型,1:个体户,2:合作社,3:企业',
`province` varchar(512) COMMENT '省份大写英文缩写',
`city` varchar(512) COMMENT '地市',
`county` varchar(512) COMMENT '区县',
`listed_status` varchar(512) COMMENT '上市状态 (上市、退市)',
`use_mark` varchar(512) COMMENT '数据使用标记,0有效,1废弃',
`create1_time` varchar(512) COMMENT '创建时间',
`update1_time` varchar(512) COMMENT '更新时间',
`latitude_longitude` varchar(512) COMMENT '经纬度',
`registered_capital` varchar(512) COMMENT '注册资本数值,量纲-万',
`currency_type` varchar(512) COMMENT '注册资本币种,默认人民币'
) ;


INSERT INTO `stg_business_baseinfo_clean` (`id`, `id_entity`, 
`register_status`, `entity_type`, `province`, `city`, `county`) VALUES
('uuid11', 'ent1', 'Active', '3', 'HN', 'Zhengzhou', '中原区'),
('uuid12', 'ent2', 'Inactive', '2', 'HN', 'Zhengzhou', '中原区'),
('uuid13', 'ent3', 'Active', '3', 'HN', 'Zhengzhou', '中原区'),
('uuid14', 'ent4', 'Active', '1', 'HN', 'Zhengzhou', '中原区'),
('uuid15', 'ent5', 'Inactive', '3', 'HN', 'Zhengzhou', '中原区');

INSERT INTO `stg_judge_document_client` (`id`, `docid`, `identity`, 
`identity_role_type`, `identity_type`, `relate_name`) VALUES
('uuid6', 'doc1', 'Plaintiff A', '0', '0', 'Related Person A'),
('uuid7', 'doc2', 'Plaintiff B', '0', '0', 'Related Person B'),
('uuid8', 'doc3', 'Plaintiff C', '0', '0', 'Related Person C'),
('uuid9', 'doc4', 'Plaintiff D', '0', '0', 'Related Person D'),
('uuid10', 'doc5', 'Plaintiff E', '0', '0', 'Related Person E');

INSERT INTO `stg_judicial_judge_document` (`id`, `document_id`, `defendant`, 
`case_number`, `case_type`, `judge_year`, `judge_date`, `case_reason`, 
`document_title`, `province`, `city`, `county`) VALUES
('uuid1', 'doc1', 'Defendant A', 'CaseNo001', 'Civil', '2023', '2023-01-15', 
'Contract Dispute', 'Document Title 1', 'HN', 'Zhengzhou', '中原区'),
('uuid2', 'doc2', 'Defendant B', 'CaseNo002', 'Criminal', '2023', '2023-02-20', 
'Theft', 'Document Title 2', 'HN', 'Zhengzhou', '中原区'),
('uuid3', 'doc3', 'Defendant C', 'CaseNo003', 'Civil', '2023', '2023-03-25', 
'Property Damage', 'Document Title 3', 'HN', 'Zhengzhou', '中原区'),
('uuid4', 'doc4', 'Defendant D', 'CaseNo004', 'Civil', '2023', '2023-04-30', 
'Breach of Contract', 'Document Title 4', 'HN', 'Zhengzhou', '中原区'),
('uuid5', 'doc5', 'Defendant E', 'CaseNo005', 'Criminal', '2023', '2023-05-05', 
'Fraud', 'Document Title 5', 'HN', 'Zhengzhou', '中原区');


执行 如下sql会报上面的错误:

select
a.years as time_slot,
a.case_type as case_type
from
(select
case_type,
year(judge_date) as years
from
(select judge_date,case_type
from stg_judicial_judge_document
)t1
order by years desc,case_type
)a
order by time_slot desc
 
执行如下sql不会报错但是查询出来的数据列头和列内容不对应。(!!!这个更可怕)
select
a.years as time_slot,
a.case_type as case_type,
a.cnt as entity_count,
round(a.cnt * 100/a.yearly_total_cnt,2) as entity_ratio
from
(select distinct
count(distinct document_id) over(partition by year(judge_date)) as 
yearly_total_cnt,
count(distinct document_id) over(partition by year(judge_date),case_type) as 
cnt,
case_type,
year(judge_date) as years
from
(select judge_date,case_type,document_id
from stg_judicial_judge_document
)t1
left join
(select docid,ent_id
from stg_judge_document_client
)t2
on t1.document_id=t2.docid
left join
(select id_entity,county
from stg_business_baseinfo_clean
)t3
on t2.ent_id=t3.id_entity
order by years desc,case_type
)a
order by time_slot desc
 
我看了hive的执行计划和 hive的 源码发现是 在做 operatorTree  
优化时出现了问题,优化后(删除了几个tree的节点)导致上面一个节点的输出和下面一个节点的输入映射出现了问题。就像是当前节点输出的是 _{_}col0 , 
_col1 ,但是 下面节点认为它输出的是 _{_}{_}col1, _col0 .导致了上面的问题。要避免这个bug的触发也很容易,就是不写嵌套的 
oracle by  语句就行。也就是一条sql中保证只有一个order by语句一定不会有问题。{_}  
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to