[
https://issues.apache.org/jira/browse/HIVE-28849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Butao Zhang resolved HIVE-28849.
--------------------------------
Resolution: Fixed
I suggest you try to upgrade to latest hive version to 4.0.1, which has lots
of bugfix & improvement.
Additionally, it is recommended that you describe the problem in English. This
will facilitate others' understanding of the issue. :)
Closing this ticket as it has been fixed in high version.
> nest order by Exception:Caused by: java.lang.ArrayIndexOutOfBoundsException
> -----------------------------------------------------------------------------
>
> Key: HIVE-28849
> URL: https://issues.apache.org/jira/browse/HIVE-28849
> Project: Hive
> Issue Type: Bug
> Components: Operators, Query Planning
> Affects Versions: 2.3.7, 2.3.9, 2.3.10
> Environment: 我们用的 2.3.9,我测试了 2.3.10 和 2.3.7都有这个bug。 这个bug和
> 执行引擎没有关系,因为生成的执行计划就是错误的 。我测试了3.1.4的版本已经没有这个问题了。
> Reporter: sundy_baba_zhouzy
> Priority: Major
> Fix For: 3.2.0, 3.1.4
>
>
> 复现步骤:
> -- ----------------------------
> -- Table structure for stg_judicial_judge_document
> -- ----------------------------
> DROP TABLE IF EXISTS `stg_judicial_judge_document`;
> CREATE TABLE `stg_judicial_judge_document` (
> `id` varchar(512) COMMENT '自增主键',
> `document_id` varchar(512) COMMENT '裁判文书网文书id',
> `defendant` varchar(512) COMMENT '被告',
> `case_number` varchar(512) COMMENT '案号',
> `case_type_code` varchar(512) COMMENT '案件类型编码(已废弃)',
> `case_type` varchar(512) COMMENT '案件类型',
> `document_type_code` varchar(512) COMMENT '文书类型编码(已废弃)',
> `documnet_type` varchar(512) COMMENT '文书类型',
> `province` varchar(512) COMMENT '省份',
> `city` varchar(512) COMMENT '地市',
> `county` varchar(512) COMMENT '区县',
> `court_id` varchar(512) COMMENT '法院id',
> `court_name` varchar(512) COMMENT '法院名称',
> `trial_name` varchar(512) COMMENT '审理程序名称',
> `judge_year` varchar(512) COMMENT '裁判年份',
> `judge_date` varchar(512) COMMENT '裁判日期',
> `case_reason` varchar(512) COMMENT '案由',
> `document_title` varchar(512) COMMENT '文书标题',
> `document_text` varchar(512) COMMENT '文书全文',
> `announce_date` varchar(512) COMMENT '公布日期',
> `data_source` varchar(512) COMMENT '数据来源(裁判文书网,北京法院网)',
> `use_mark` varchar(512) COMMENT '使用标志0正常,1人工已修正,3未解析,9历史删除,10废弃数据',
> `source_url` varchar(512) COMMENT '原始连接URL',
> `in_store_time` varchar(512) COMMENT '入库时间',
> `last_update_time` varchar(512) COMMENT '更新时间',
> `plaintiff` varchar(512) COMMENT '原告',
> `is_not_display` varchar(512) COMMENT '是否不展示(已废弃)'
> );
> -- ----------------------------
> -- Table structure for stg_judge_document_client
> -- ----------------------------
> DROP TABLE IF EXISTS `stg_judge_document_client`;
> CREATE TABLE `stg_judge_document_client` (
> `id` varchar(512) COMMENT '自增主键',
> `docid` varchar(512) COMMENT '主表docid',
> `identity` varchar(512) COMMENT '当事人身份,原告,被告',
> `identity_role_type` varchar(512) COMMENT '当事人角色类型,0原告,1被告,2其他',
> `identity_type` varchar(512) COMMENT '当事人类型,0个人,1非个人',
> `relate_name` varchar(512) COMMENT '相关人名称',
> `document_publish_date` varchar(512) COMMENT '文书公布日期',
> `case_reason` varchar(512) COMMENT '案由',
> `use_mark` varchar(512) COMMENT '使用标志0正常,1人工已修正,9历史删除,10废弃数据',
> `last_update_time` varchar(512) COMMENT '更新时间',
> `is_not_display` varchar(512) COMMENT '是否不展示(已废弃)',
> `reserve1` varchar(512) COMMENT '预留字段1',
> `reserve2` varchar(512) COMMENT '预留字段2',
> `ent_id` varchar(512) COMMENT '公司id,party_name是公司,则关联工商ic主键'
> ) ;
> -- ----------------------------
> -- Table structure for stg_business_baseinfo_clean
> -- ----------------------------
> DROP TABLE IF EXISTS `stg_business_baseinfo_clean`;
> CREATE TABLE `stg_business_baseinfo_clean` (
> `id` varchar(512) COMMENT '自增主键',
> `id_entity` varchar(512) COMMENT '主体唯一键',
> `register_status` varchar(512) COMMENT '归类后的登记状态',
> `entity_type` varchar(512) COMMENT '归类后的主体类型,1:个体户,2:合作社,3:企业',
> `province` varchar(512) COMMENT '省份大写英文缩写',
> `city` varchar(512) COMMENT '地市',
> `county` varchar(512) COMMENT '区县',
> `listed_status` varchar(512) COMMENT '上市状态 (上市、退市)',
> `use_mark` varchar(512) COMMENT '数据使用标记,0有效,1废弃',
> `create1_time` varchar(512) COMMENT '创建时间',
> `update1_time` varchar(512) COMMENT '更新时间',
> `latitude_longitude` varchar(512) COMMENT '经纬度',
> `registered_capital` varchar(512) COMMENT '注册资本数值,量纲-万',
> `currency_type` varchar(512) COMMENT '注册资本币种,默认人民币'
> ) ;
> INSERT INTO `stg_business_baseinfo_clean` (`id`, `id_entity`,
> `register_status`, `entity_type`, `province`, `city`, `county`) VALUES
> ('uuid11', 'ent1', 'Active', '3', 'HN', 'Zhengzhou', '中原区'),
> ('uuid12', 'ent2', 'Inactive', '2', 'HN', 'Zhengzhou', '中原区'),
> ('uuid13', 'ent3', 'Active', '3', 'HN', 'Zhengzhou', '中原区'),
> ('uuid14', 'ent4', 'Active', '1', 'HN', 'Zhengzhou', '中原区'),
> ('uuid15', 'ent5', 'Inactive', '3', 'HN', 'Zhengzhou', '中原区');
> INSERT INTO `stg_judge_document_client` (`id`, `docid`, `identity`,
> `identity_role_type`, `identity_type`, `relate_name`) VALUES
> ('uuid6', 'doc1', 'Plaintiff A', '0', '0', 'Related Person A'),
> ('uuid7', 'doc2', 'Plaintiff B', '0', '0', 'Related Person B'),
> ('uuid8', 'doc3', 'Plaintiff C', '0', '0', 'Related Person C'),
> ('uuid9', 'doc4', 'Plaintiff D', '0', '0', 'Related Person D'),
> ('uuid10', 'doc5', 'Plaintiff E', '0', '0', 'Related Person E');
> INSERT INTO `stg_judicial_judge_document` (`id`, `document_id`, `defendant`,
> `case_number`, `case_type`, `judge_year`, `judge_date`, `case_reason`,
> `document_title`, `province`, `city`, `county`) VALUES
> ('uuid1', 'doc1', 'Defendant A', 'CaseNo001', 'Civil', '2023', '2023-01-15',
> 'Contract Dispute', 'Document Title 1', 'HN', 'Zhengzhou', '中原区'),
> ('uuid2', 'doc2', 'Defendant B', 'CaseNo002', 'Criminal', '2023',
> '2023-02-20', 'Theft', 'Document Title 2', 'HN', 'Zhengzhou', '中原区'),
> ('uuid3', 'doc3', 'Defendant C', 'CaseNo003', 'Civil', '2023', '2023-03-25',
> 'Property Damage', 'Document Title 3', 'HN', 'Zhengzhou', '中原区'),
> ('uuid4', 'doc4', 'Defendant D', 'CaseNo004', 'Civil', '2023', '2023-04-30',
> 'Breach of Contract', 'Document Title 4', 'HN', 'Zhengzhou', '中原区'),
> ('uuid5', 'doc5', 'Defendant E', 'CaseNo005', 'Criminal', '2023',
> '2023-05-05', 'Fraud', 'Document Title 5', 'HN', 'Zhengzhou', '中原区');
> 执行 如下sql会报上面的错误:
> select
> a.years as time_slot,
> a.case_type as case_type
> from
> (select
> case_type,
> year(judge_date) as years
> from
> (select judge_date,case_type
> from stg_judicial_judge_document
> )t1
> order by years desc,case_type
> )a
> order by time_slot desc
>
> 执行如下sql不会报错但是查询出来的数据列头和列内容不对应。(!!!这个更可怕)
> select
> a.years as time_slot,
> a.case_type as case_type,
> a.cnt as entity_count,
> round(a.cnt * 100/a.yearly_total_cnt,2) as entity_ratio
> from
> (select distinct
> count(distinct document_id) over(partition by year(judge_date)) as
> yearly_total_cnt,
> count(distinct document_id) over(partition by year(judge_date),case_type) as
> cnt,
> case_type,
> year(judge_date) as years
> from
> (select judge_date,case_type,document_id
> from stg_judicial_judge_document
> )t1
> left join
> (select docid,ent_id
> from stg_judge_document_client
> )t2
> on t1.document_id=t2.docid
> left join
> (select id_entity,county
> from stg_business_baseinfo_clean
> )t3
> on t2.ent_id=t3.id_entity
> order by years desc,case_type
> )a
> order by time_slot desc
>
> 我看了hive的执行计划和 hive的 源码发现是 在做 operatorTree
> 优化时出现了问题,优化后(删除了几个tree的节点)导致上面一个节点的输出和下面一个节点的输入映射出现了问题。就像是当前节点输出的是 _{_}col0 ,
> _col1 ,但是 下面节点认为它输出的是 _{_}{_}col1, _col0 .导致了上面的问题。要避免这个bug的触发也很容易,就是不写嵌套的
> oracle by 语句就行。也就是一条sql中保证只有一个order by语句一定不会有问题。{_}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)