[
https://issues.apache.org/jira/browse/HIVE-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160916#comment-16160916
]
Zoltan Haindrich commented on HIVE-17416:
-----------------------------------------
I've put togerther a small repro test and run it on the active development
branches:
looks like master and branch-2 is not affected; however on branch-1 the bug is
present.
repro qtest:
{code}
create table t (field_name string);
insert into t values
('e_2300a?fx'),
('e_2300a'),
('x');
select
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from t;
select distinct
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from t;
{code}
> Hive Distinct changes column value
> ----------------------------------
>
> Key: HIVE-17416
> URL: https://issues.apache.org/jira/browse/HIVE-17416
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.2.1
> Reporter: Manoj Durisheti
>
> Hive 1.2.1000.2.6.1.0-129
> Below query with distinct is expected to just dedupe the resultant data. But
> it alters the data.
> *Query without Distinct:*
> select
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1)
> r_field_name,
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1)
> w_field_name
> from alpha.table_name
> where
> datestamp = 20170805
> and
> field_name =
> 'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
> ;
> Result:
> e_2300a e_2300
> e_2300a e_2300
> e_2300a e_2300
> e_2300a e_2300
> e_2300a e_2300
> *Query with Distinct:*
> select distinct
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1)
> r_field_name,
> REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1)
> w_field_name
> from alpha.table_name
> where
> datestamp = 20170805
> and
> field_name =
> 'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
> ;
> Result:
> e_2300 e_2300
> *Expected Result with Distinct is: *
> e_2300a e_2300
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)