[
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153118#comment-14153118
]
Zhichun Wu commented on HIVE-8151:
----------------------------------
@ [~prasanth_j], after applying HIVE-8151.7.patch , the bug still exists, here
is the testcase:
{code}
use test;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.optimize.sort.dynamic.partition=true;
drop table if exists src1;
create table src1 (
key int,
val string
);
load data local inpath '../hive/examples/files/kv1.txt' overwrite into table
src1;
drop table if exists hive13_dp1;
create table if not exists hive13_dp1 (
k1 int,
k2 int
)
PARTITIONED BY(`day` string COMMENT 'days')
STORED AS ORC;
insert overwrite table `hive13_dp1` partition(`day`)
select
key k1,
count(val) k2,
"day" `day`
from src1
group by "day", key;
select * from hive13_dp1 limit 5;
{code}
> Dynamic partition sort optimization inserts record wrongly to partition when
> used with GroupBy
> ----------------------------------------------------------------------------------------------
>
> Key: HIVE-8151
> URL: https://issues.apache.org/jira/browse/HIVE-8151
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0, 0.13.1
> Reporter: Prasanth J
> Assignee: Prasanth J
> Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch,
> HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch
>
>
> HIVE-6455 added dynamic partition sort optimization. It added startGroup()
> method to FileSink operator to look for changes in reduce key for creating
> partition directories. This method however is not reliable as the key called
> with startGroup() is different from the key called with processOp().
> startGroup() is called with newly changed key whereas processOp() is called
> with previously aggregated key. This will result in processOp() writing the
> last row of previous group as the first row of next group. This happens only
> when used with group by operator.
> The fix is to not rely on startGroup() and do the partition directory
> creation in processOp() itself.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)